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CONSENSUS C NFIGDRATI NAL BIAS MONTE 

CARLO KETHOD AND SYSTEM FOR 
PHARMACOPHORE STUngPtmE DETERMIMATTp y 

This specification includ s in Sec. 8 computer program 
listings that arc exemplary embodiments of the computer 
programs of this invention. 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by any one of the patent disclosure, as it 
appears in the Patent and Trademark Office patent files and 
records, but otherwise reserves all copyright rights 
whatsoever . 

This invention was made with Government support under 

Grant number 1R43CA62752-01 awarded ^by__t^ - 

- ^Institutes of* HValth. The Government has certain rights in 
the invention. 
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1. FIELD OF THE INVEWTJ oy 
The field of this invention is computer assisted methods 
of drug design. More particularly the field of this 
invention is computer implemented smart Monte Carlo methods 
which utilize NMR and binders to a target of interest as 
inputs to determine highly accurate molecular structures that 
must be possessed by a drug in order to achieve an effect of 
interest. Illustrative U.S. Patents are 5,331,573 to Balaji 
et al., 5,307,287 to Cramer, III et al., 5,241,470 to Lee at 
al., and 5,265,030 to Skolnick et al. 

2. BACKQRomro 
Protein interactions have recently emerged as a 
fundamental target for pharmacological intervention. For 
example, the top two major uncured diseases in the United 
States are atherosclerosis (the principal cause of heart 
attack and stroke) and cancer. These diseases are 
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responsible for greater than 50% of all U.S. mortality and 
cost the U.S. economy over $200 billion per year. A 
consistent picture of these diseases, which has gradually 
emerged during the past ten years of molecular biological and 
5 medical research, views both as triggered by disordering of 
specific molecular recognition events that take place among 
sets of proteins present in both the normal and disease 
states . 

Hierarchical, organized patterns of protein-protein 

10 interactions are often referred to as "pathways" or 
"cascades." At the molecular level, cancers have been 
determined to be the deregulation of pathways of interacting 
proteins responsible for guiding cellular growth and 
differentiation. During the past year, individual cellular 

15 events^ ^ave jDeen jorganized . in 

explanations of how a cell's behavior is controlled by its 
environment and how communication pathway errors lead to 
uncontrolled proliferation and cancer. Disruption in similar 
pathways are responsible for the proliferation of blood 

20 vessel walls marking the atherosclerotic disease state (Cook 
et al., 1994, Nature 369:361-362; Hall, 1994, Science 
264:1413-1414; Ross, 1993, Nature 362:801-809; Zhang et al . , 
1993, Nature 364:308-313). 

Inhibition or stimulation of particular protein- 

25 substrate interactions have long been known drug targets. 
Many important anti-hypertensives, neurotransmitter 
analogues, antibiotics, and chemotherapeutic agents act in 
this fashion. Captopril, an antihypertensive drug, was 
designed based on its ability to antagonize a focal blood- 

30 pressure-regulating enzyme. 

Proteins involved in biological processes, either as 
part of protein-protein pathways or as enzymes, are composed 
of domains (Campbell et al . , 1994, Trend. BioTech. 
12:168-172; Rothberg et al., 1992, J. Mol. Biol, 

35 227:367-370). Domains, or regions of the protein of stable 
three dimensional (secondary and tertiary) structures, play 
several major roles, including providing on their surface 
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small regions ("examples of targets"), where proteins and 
substrates are able to bind and interact, and functioning as 
structural units holding other domains together as part of a 
large protein (tertiary and quaternary structure) . The 
5 interaction surface of a domain or target is fundamental to 
determining binding specificity. Targets are oftin small 
enough that the principal contribution to the binding energy 
is short range, highly localized to several amino acids 
(Wells, 1994, Curr. Op. Cell Biol. 6:163-174). The 
10 functional specificity of targets and domains, responsible 
for the incredible diversity of cellular function, ultimately 
rests with the arrangement of amino acid side chains forming 
their interaction surfaces, or targets (Marengere et al . , 
rsS4, Nature 369:502-505). 

.. _- l}J^J't 3PS^^^^^'= theretore, ^hat^ pharmacorogical = ' 
intervention affecting the specific protein-protein and 
protein -substrate recognition events occurring at protein 
targets is of fundamental importance, particularly for 
effective drug design. 
20 However, achieving desired pharmacological interventions 

in a predictable manner remains as elusive as ever. Early 
approaches to drug design depended on the chance observation 
of biological effects of a known compound or the screening of 
large numbers of exotic compounds, usually derived from 
25 natural sources, for any biological effects. The nature of 
the actual protein target was usually unJcnown. 

2.1. TARGET STRUCTURE -BASED 

APPROACHES TO pB PG P^^ t^ 

30 Rational approaches to drug design. have met with only 

limited success. Current rational approaches are based on 
first determining the entire structure of the proteins 
involved in particular interactions, examining this structure 
for the possible targets, and then predicting possible drug 

35 molecules likely to bind to the possible target. Thus the 
location of each of the thousands of atoms in a protein must 
be accurately determined before drug design can begin. 

- 3 - 
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Direct experimental and indirect computational methods for 
protein structure determination are in current use. However, 
none of these methods app ars to be sufficiently accurate for 
drug design purposes according to current rational 
5 approaches. 

The primary direct experimental methods for determining 
the structure of proteins involved in particular interactions 
are X-ray crystallography, relying on the interaction of 
electron clouds with X-rays, and liquid nuclear magnetic 
10 resonance (NMR) . relying on correlations between polarized 
nuclear spins interacting via indirect dipole-dipole 
interactions. X-ray methods provide information on the 
location of every heavy atom in a crystal of interest 
accurate to 0.5-2.0 A (1 A = lO" cm). Drawbacks of x-ray 
15 met hods^include di^ff acuities in-obtaimng-hT:gh-5^ ^ ^ 

crystals, expense and time associated with the 
crystallization process, and difficulties in resolving 
whether or not the structure of the crystalline forms is 
representative of the in vivo conformation (Clore et al 
20 1991, J. Mol. Biol. 221:47; Shaanan et al . , 1992, Science 
227:961-964). High resolution, multidimensional, liquid 
phase NMR techniques represent an attractive alternative to 
the extent that they can be applied in situ (i.e., in aqueous 
environment) to the study of small protein domains (Yu et 
25 al., 1994, cell 76:933-945). However, the complexity of the 
analysis of the various mutual correlations is time 
consuming, and the correlations (primarily from the nuclear 
Overhausser effect) provide no better accuracy than X-ray 
methods. Isotopic enrichment of proteins with "c and »N 
30 reduces the time associated with analysis, but at a great 
expense (Anglister et al., 1993, Frontiers of NMR in Biology 
III L2011) . ^ 

Protein structures determined by any of these current 
methods do not predict Buccess in subsequent drug design 
35 Resolution obtainable either by measurement or computation 
generally 0.5-2 A, has often been found to be inadequate for 
effective direct drug design, or for selection of a lead 
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compound from organic compound libraries. The resolution 
required to understand both drug affinity and drug 
specificity, although not precisely known, is probably 
measured in fractions of an A, down to 0.1 A (MacArthur et 
5 al., 1994. Trend. BioTech. 12:149-153). This accuracy 
appears to be beyond the capabilities of many current 
methodologies . 

Prior research has identified tools which, although 
promising, cannot be used in a coordinated manner for drug 
10 design. One promising measurement approach with speed, 
simplicity, accuracy, and the ability to carefully control 
the measurement environment is rotational echo double 
resonance (REDOR) NMR. a type of solid state NMR (Guillion 
and Schaefer, 1989, J. Magnetic Resonance 81:196; Holl et 

1993. J. Am. Chem. Soc. 115:238-244). REDOR accuracy can'be 
below the 0.1 A believed to be sufficient for direct drug 
design. However, since REDOR measures only a few selected 
distances, it is not usable in drug design methods which 
20 depend on the initial determination of the complete structure 
of the protein containing the target of interest. 

Once a target's structure is determined by the above 
methods, most rational drug design paradigms call for the 
prediction of small drug structures that will bind (or dock) 
25 to the target. This prediction is generally done by 

computational methods, of which several are in current use. 
Most seek to predict the position of all the thousands of 
atoms in a drug structure. Purely ab initio computational 
approaches to high resolution structure analysis, such as 
30 quantum statistical mechanics and molecular dynamics, require 
prohibitive computing resources. To apply either approach, 
the potential energy, or Hamiltonian, of the entire system' 
must be known. Statistical mechanics provides an expression 
for the probability of any given protein configuration as a 
35 ratio of partition functions. Proper quantum statistical 
mechanics required for an exact evaluation of full protein 
partition functions is not currently computationally 
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feasible, as it would involve many thousands of atoms 
including the target, the protein, and the aqueous 
environment. The application of even simple, approximate 
quantum statistical mechanics to simple systems in aqueous 
5 environments is currently a non- trivial task (Chandler. 1991, 
in X^jquids. Freezing, and Gl ass Transit. inng Elsevier, NY, p. 
195) . Molecular dynamics computes the dynamics of a 
molecule's motion in time. Computing the atomic dynamics of 
all the perhaps thousands atoms of a protein is an extreme 
10 computational burden. Only picoseconds, or at most a few 
nanoseconds, of molecular time can be simulated, which is 
insufficient to determine a high resolution, equilibrium, 
structure (Smit et al . , 1994, J. Phys. Chem. 98:8442-8452)= 
In any case, most of the information determined is wasted, 

1^5 JHHStUEe^pf the ,protein-binding-target=are-"of 
interest in drug design. 

Further, current approximate computational techniques 
for protein structure determination are in need of greater 
accuracy or efficiency. The most common techniques depend on 

20 Molecular Dynamics or Monte Carlo methods (Nikif orovich. 
1994, Int. J. Peptide Protein Res. 44:513-531; Brunger and 
Karplus, 1991, Acc. Chem. Res. 24:54-61). These methods 
randomly alter initial molecular structures by generating 
simulated thermal perturbations, and then average the 

25 ensemble of results to determine a final structure. The 
generated perturbation must preserve all structural 
constraints and be energetically favorable. If both 
conditions are not met, the perturbation will be discarded. 
Current Monte Carlo methods applied to constrained protein 

30 structure determinations productively use only approximately 
1 out of IQS perturbed structures generated (Siepmann et al., 
1993, Nature 365:330-332). This extreme waste of computer ' 
resources results in time consuming, low resolution structure 
determinations . 

35 To summarize, existing rational drug design methods 

based on identification of target structure fail to reliably 
yield drug molecules due to experimental structure 
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determination difficulties and computational difficulties 
associated with predicting drug structures with ill -defined 
Hamiltonians. 

5 2.2. DIVERSITY-BASED APPROACHES TO DRUG DESIGN 

Another method for exploring protein target interactions 
utilizes "recognition systems" which comprise huge libraries 
of related molecules (Clarkson et al., 1994, Trend. BioTech. 
12:173-184). From such a library only those members binding 

10 to the target of interest are selected. Such recognition 
systems must encompass the structural diversity of protein 
targets while being amenable to serve for the selection of 
lead compounds for drug design. Antibodies are one classic 
example of such a system that certainly meets the recognition 

15 requirement . Unfortunately , there„ is a need to determine the" 
antxbddy^structures needed for lead compound selection more 
rapidly and accurately. While about 2000 recognition regions 
have been sequenced, only about 23 in the Brookhaven Protein 
Structural Database have structures determined to even within 

20 2 A IRees et al . , 1994, Trends in Biotech. 12:199-206). 

Promising recognition systems at the opposite extreme 
comprise huge libraries of small peptides. The small 
peptides must be sufficiently diverse so that they attain a 
level of affinity and specificity similar to that obtained by 

25 protein domains. Given the role peptides play in nature, 
this condition can be met by surprisingly small structures, 
with 6 to 12 amino acids. However, linear peptides are either 
unstructured or weakly structured at room temperature in 
aqueous solutions (Alberg et al., 1993, Science 262:246; 

30 Skalicky et al., 1993, Protein Science 10:1591-1603). Prom a 
practical viewpoint, linear peptides must be constrained to 
reduce their degrees of freedom (reduced conformational 
entropy) and to increase their chances for strongly binding. 
These constraints, or scaffolds, limit the range of stable 

35 conformations and make more straightforward determining bound 
structure (Olivera et al.. 1990, Science 249:259; Tidor et 
al., 1993, Proteins: Structure Function and Genetics 15:71). 
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Methods are now available to create such libraries and 
to select library members that recognize a specific protein 
target . The production of constrained peptide diversity 
libraries requires synthesizing oligonucleotides with the 
5 desired degeneracy to code for the peptides and ligating them 
into selection vectors (Goldman et al . , 1994, Bio/Tech. 
10:1557-1561). Once a constrained structured diversity 
library is created, it is a source from which to select 
specific members that bind to a target of interest. Beginning 

10 with a known pathway involving specific domain-domain or 
protein-substrate interactions at a target, molecular 
biological methods can be used to identify in a matter of 
days small ensembles of highly constrained peptides from 
these huge libraries that bind to these domains with high 
^ISLaffinit^ and^speci£icity=^ — — _ ^- _ ^ 

While this field has been exploding in the last few 
years and showing great potential, it is severely limited by 
its use in isolation without the benefit of integrated 
structural analysis needed both to derive the high resolution 

20 structures of binding peptides and also to direct the 

construction of additional structured libraries. Drug design 
is not aided by having library members recognizing the 
protein target of interest but without any understanding of 
why. the recognition occurs. This is entirely similar to the 

25 random screening methods of early fortuitous drug design 
efforts . 

Unfortunately, rational drug design according to current 
approaches (target structure-based) remains an inefficient, 
laborious process with a disproportionately high lead- 

30 compound failure rate. Presently, about 90% of lead 

compounds fail to emerge successfully from clinical trials 
(Trends in U.S. Pharmaceutical Sales and Research and 
Development, Pharmaceutical Manufacturing Association, 
Washington, D.C, 1993). 

35 It is becoming clear that low- resolution structures of 

an entire protein or target (at 0.5-2 A), or an 



wo 96/30849 PCT/US96m229 

uncharacterized lead, such as produced by chemical diversity 
methods, leave much to be desired for use in drug design. 

If the limitations of prior art methods were overcome 
and a sufficiently accurate structure needed by a molecule to 
5 bind to a target of interest could be determined, existing 
chemical libraries could be searched for highly targeted lead 
compounds with similar structure (Martin, 1992, J. Medicinal 
Chem. 35:2145-2154) . This database search can be based not 
only on chemical and electronic properties, but also on 
10 geometric information. Such searches that have high 
resolution (better than 0.25 A), would provide a vast 
improvement over the prior art, as lower resolutions lead to 
an exponentially increasing number of potential leads. 

Computational methods to determine high resolution ^ruc 
A5structures^from^recognition system^ 

partial distance measurements are not currently available. 
No current structure determination methods uses such 
additional information to make more efficient or more 
accurate determination of high resolution structures 
20 (Holzman, 1994, Amer. Sci. 872:267). 

Citation of a reference or discussion hereinabove shall 
not be construed as an admission that such is prior art to 
the present invention. 

3 • SUMMARY OF THE TNVEKTTQW 

It is a broad object of this invention to address the 
prior art problems of drug design by providing a method of 
rational design of drugs that achieve their effect by binding 
to a target molecule or molecular complex of interest. 

30 Importantly, this object is achieved without requiring 

determination of the structure of the molecule or molecular 
complex ("target molecule") bearing the target or even of the 
target itself . The method is target structure independent 
The method of the invention uses an interdisciplinary 

35 combination of computational modeling and simulation, 
experimental distance constraints, and molecular biology. 
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In an important aspect, the invention provides a 
computer implemented modeling and simulation method to 
determine a highly accurate consensus structure for the 
pharmacophore and a structure for the remainder of the 
5 molecule from diversity library members that bind to the 
protein target of interest. Where prior structure 
determination methods focused on the structure of the target 
molecule or of the target, the method of this invention is 
uniquely adapted to focus instead on the structures of 

10 molecules that bind to the target . Such structural 

information is directly applicable to drug design since it 
defines the structure a drug must possess to bind to the 
target of interest. Also, this structural information is 
much easier to determine by use- of- the present invention," 

15 since it concerns molecules with many fewer atoms than the 
target molecule . The method of the invention achieves 
accuracy by improving upon the accuracy and utility of the 
input structural information. In a further embodiment of the 
invention, the method employed for structural determination 

20 is a smart Monte Carlo technique adapted to small constrained 
molecules . 

The structure determination method of the invention 
allows one to take maximum advantage of the information 
obtained from the molecular biological selection of the 

25 diversity library members that tightly and specifically bind 
to the target molecule of interest. The selected library 
members must share some common structure to bind to the same 
target molecule. The smart Monte Carlo computer method of 
this invention specifically seeks and provides this common 

30 structure. 

The invention also provides a method of performing REDOR 
NMR measurements of molecules on a solid phase substrate. In 
a preferred embodiment, the substrate is a solid phase on 
which the molecule {e.g., peptide) has been synthesized, with 
35 a high degree of purity. In another preferred embodiment, 
performing REDOR measurements of such a molecule on a 
substrate can be done in a dry nitrogen atmosphere, under 
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hydrated conditions, and when the molecule is either free or 
bound to a target. In a specific embodiment, the REDOR 
measurements are accurate to better than 0.05 A from 0 to 4 
A, and to better than 0.1 A from 4 to 8 A. In an 
5 advantageous aspect of the invention, the structure 
determination method makes maximum use of these highly 
accurate internuclear distance measurements to constrain the 
determined common structure for the binding library members. 
The invention also provides methods of identifying a 

10 compound that specifically binds to a target molecule, by 

first screening a diversity library, and then using a genetic 
selection method for screening the compounds identified from 
the diversity library. 

In broad aspects, the invention jyrqyides-- a method arid ^ 

15 appara^u^ =f or rat-ionar alld predictable design of new and/or 
improved drugs that achieve their effect by binding to a 
specified target molecule. More particularly, the invention 
is directed to a method for the rational selection of highly 
specific lead compounds for such drug design, including the 

20 computer implemented step of highly accurate determination of 
the structure responsible for this target binding by the 
highly accurate, consensus, conf igurational bias Monte Carlo 
method. 

A lead compound serves as a starting point for drug 
25 development both because it specifically binds to the protein 
target of interest, achieving the biological effect of 
interest, and because it has or can be modified to have good 
pharmacokinetics and medicinal applicability. A final drug 
may be the lead compound or may be derived therefrom by 
30 modifying the lead to maximize beneficial effects and 

minimize harmful side-effects. Although any lead compound is 
useful, a lead that tightly and specifically binds to the 
target molecule of interest in a knovm manner, such as can be 
provided by the invention, is of great use. Knowledge of the 
35 high resolution structures in a lead compound responsible for 
its binding and activity provides a more focused and 
efficient drug development process, 

- 11 - 
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The methods of the invention improve lead compound 
determination, by determining the "pharmacophore", the 
precise structural characteristics needed for a lead compound 
to specifically bind to a target of interest. The most 
5 fundamental specification of a pharmacophore is in terms of 
the electronic properties necessary for a molecule to 
specifically bind to the surface of a target molecule. These 
properties may be fundamentally represented by requirements 
on the ground and low lying excited state wave functions of a 
10 pharmacophore, such as, for example, by specifying 

requirements on the well known multiple expansion of these 
wave functions. 

The preferred pharmacophore specification according to 
the invent i£n^s in jte^m^ o^ both_th^_ <^ making-^- - 

15 up the pharmacophore and determining its electronic 

properties and also the geometric relationships of these 
groups. This chemical representation is not the only 
possible representation of the pharmacophore. Several 
chemical arrangements may have similar electronic properties. 
20 For example, if a pharmacophore specification included an -OH 
group at a particular position, a substantially equivalent 
specification might include an -SH group at the same 
position. Equivalent chemical groups that may be substituted 
in a pharmacophore specification without substantially 
25 changing its nature are called "homologous". 

In particular embodiments, ' therefore, this invention 
provides a method and apparatus for the highly accurate 
determination of the pharmacophore needed to specifically 
bind to the target molecule of interest, by a specification 
30 of the geometric relationships of the important chemical 
groups. The pharmacophore is preferably determined by a 
smart Monte Carlo method from molecular biological input 
specifying molecules (preferably selected from among 
diversity libraries) that specifically bind to the target 
35 molecule and also preferably from REDOR NMR data specifying a 
few highly accurate distances in these selected molecules. 
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All important advantage provided by the invention is the 
ability to make a pharmacophore structure determination 
without relying on any knowledge of the structure of the 
target molecule or target. Where the target molecule is a 
5 protein, conventional prior art methods have sought to 
sequence and determine the structure of the protein 
containing the target, hoping thereby to determine active 
sites by examination of the structure. A further important 
advantage of the invention is that this structure 

10 determination can be made by use of a relatively small number 
of actual physical position measurements. IiT^contrast , 
conventional methods using X-ray crystallography and liquid 
NMR require determination of positions of all atoms in the 
molecule {"binder") that specifically binds to the taj-get _ ^ ^ - - 

15 an^ tjie_target.. ^An additionaT advah^^ provided by the 
invention is that, in a preferred embodiment wherein REDOR 
structural measurements provide input information, the 
accuracy of the pharmacophore structure determination can be 
at least approximately 0.25-0.50 A or better. This accuracy 

20 is provided by the combination of an efficient, Monte Carlo 
technique for structure determination with a few highly 
accurate distance determinations. 

4. BRIEF DESCRIPTION OF THE DRAWINGS 
25 These and other features, aspects, and advantages of the 

present invention will become better understood by reference 
to the accompanying drawings, following description, and 
appended claims, where: 

Fig, 1 is the overall method of this invention in its 
30 broadest aspect; 

Fig. 2A and 2B are more detail for, the step of Pig. i 
for selecting candidate phairmacophore structures; 

Fig. 3 is more detail for the step of Fig. 1 for 
preforming distance measurements; 
35 Fig. 4 is more detail for the step of Fig. 3 for 

performing NMR measurements; 
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Fig. 5 is REDOR NMR signal response details for step of 
Fig. 3 of data analysis; 

Fig. 6 is sample REDOR NMR spectra according to the 
method of Fig, 3; 
5 Fig. 7 is sample data analysis according to the method 

of Fig. 3; 

Fig. 8 is more detail for the Step of Fig. 1 for 
conf igurational bias Monte Carlo structure determination; 
Fig. 9 is a sample of simulation completion data; 
10 Fig. 10 is further detail of peptide memory 

representation used in the method of Fig. 8; 

Fig. 11 is additional detail of peptide memory 
representation used in the method of Fig. 8; 

Fig. 12 is more detail for the step of Fig . Ji_of_ _ _ ^_ __ . 
- 15 processor generation^of "proposed modified structures by Type 

I moves; 

Fig. 13 is more detail for the step of Fig. 8 of 
processor generation of proposed modified structures by Type 

II moves; 

20 Fig. 14 is additional detail for the step of Fig. 8 of 

processor generation of proposed modified structures by Type 
II moves; 

Fig. 15 is a structure for implementing the method of 
Fig. 8; 

25 Fig. 16 is the main program structure of Fig. 15; 

Fig. 17 is the structure modification program structure 
of Fig. 15; 

Fig. 18A and 18B are the Type 1 move generator program 
structure of Fig. 17; 
30 Fig. 19A and 19B are the Type II move generator program 

structure of Fig. 17. 

5. PgTAILED DESCRIPTION 
For clarity of disclosure, and not by way of limitation, 
35 the detailed description of the invention is described as a 
series of steps. A broad view of the ex mplary steps of 
which the invention is comprised is presented in Fig. 1, a 
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brief overview of which is presented in the text that 
follows . 

The invention method preferably begins with a target 
molecule (or molecular complex) 1 having a binding target of 
5 biological or pharmacological interest. Specific binding of 
a molecule to the target is predicted to affect its 
biological activity and may provide biological effects of 
interest. For example, these effects might include 
amelioration of a disease process or alteration of a 
10 physiological response. Lead compounds 8 output from the 
invention are able to specifically bind to target molecule 1 
and can serve as starting points for the design of a drug 
able to specifically bind to the target. 

Diversity library screening, step 2, allows the 
__ _ ^ _ J-5_ select ion .from -among irbrary membei^s^of ^a^ plural i^y^^^^ 

molecules [hereinafter called "binders"] that specifically 
bind to target molecule (or molecular complex) 1; the 
chemical building block structure (e.g., sequence, structural 
formula) is then determined. If predetermined binders and 
20 their structure are already available, the invention can use 
this information directly without the need for library 
screening. If library screening is done, one or more 
libraries may be screened. The selected binders all share a 
common pharmacophore structure, allowing their specific 
25 binding to the target in a chemically and physically similar 
manner. This common structure is preferably iteratively 
determined by a select and test method. Candidate 
pharmacophore selection, step 3, is based upon chemical 
structure homologies. Geometric and conformational 
30 information is not needed to be used at this step and is 

preferably not considered. A candidate pharmacophore shared 
by all the N binders is selected, step 3, for structure 
determination by subsequent steps. The binders will 
typically present several candidate chemical pharmacophores, 
35 ignoring conformation considerations. These candidates are 
small groups of library building blocks, often contiguous, 
each candidate group in one binder being homologous to the 
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candidate groups in all the other binders. Building block 
homologies are determined by applying rules appropriate to 
the diversity library. In the preferred embodiment, 
homologous building blocks have similar surface chemical 
5 groups, since pharmacophores are defined by a similar 

geometric arrangement of chemical structures. In' the case of 
the preferred library, CXfC, candidate pharmacophores are 
amino acid sequences whose side chain surface groups have 
similar chemical properties. Amino acid homologies are 

10 determined by mechanical rules described below. These 

candidate sequences are typically 3 amino acids long, but may 
range from 2 all the way to 6. Where pharmacophores are 
defined by their charge distributions, homologous library 
building blocks must have similar charge distributions. 
^-15 _ . _-Having=selected -binders -by screening one dr^ more ^ ^ ^ 
libraries and determined a candidate pharmacophore in each 
binder, the subsequent steps of distance measurement, step 4, 
and Monte Carlo structure determination, step 5, determine a 
highly accurate structure for the candidate pharmacophore, if 

20 possible. This determination will be possible if the 

candidate is the actual pharmacophore. A subsequent test, 
step 6, checks for success of this structure determination. 
In particular cases, distance measurements may not be 
necessary in order to determine an adequately precise 

25 pharmacophore structure. 

Measurements are made, step 4, of a few strategic 
distances in the binders, that will be most useful for the 
subsequent structure determination step. A minimum number of 
strategic interatomic distances in the binders are measured 

30 in step 4. These few distances constrain possible binder 
structures and make the subsequent complete structure 
determination more efficient and more accurate. In preferred 
but not limiting embodiments, measurement methods yielding 
distances accurate to at least approximately 0.25 A or less 

35 are used. The preferred methods use nuclear magnetic 

resonance ("NMR"] techniques. Particularly preferred is the 
rotational -echo double resonance ("REDOR*'] NMR method for 
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directly measuring "C-^^N internuclear distances in peptides, 
the most accurate current method for simply and inexpensivelv 
obtaining such distances. It is generally capable of 
accuracy to 0.1 A and a span of 8 A. In a specific 
5 embodiment, peptide binders are synthesized from amino acids 
labeled with "C and "N. Labeling is chosen to obtain the 
most useful distance data about the selected candidate 
pharmacophore structures. Either backbone nuclei, side chair, 
nuclei, or both can be labeled. The step is detailed below. 
10 Liquid NMR techniques can also be used to indirectly 

determine internuclear distances in peptides, but are less 
preferred since they require considerable data interpretation 
to obtain distances of less accuracy than those obtained by 
use of REDOR. 

15 __ _ ^Sj:xuctur^_determination,-step-5T ^dete^^^^ ^ 
geometric conformation for both the candidate shared chemical 
structures, if possible, and the remainder of the binders. 
The preferred but not limiting method, consensus, 
conf igurational bias, Monte Carlo ["CCBMC"] determination, 

20 step 5, is an efficient smart Monte Carlo method uniquely 
able to incorporate knowledge from prior steps to obtain 
highly accurate physical binder structures. From library 
screening, step 2, it is deduced that the binders have a 
shared, actual pharmacophore, structure because they all bind 

25 specifically to the same target molecule (hence, a 

"consensus" method) . It is not significant to the method if 
the binders come from more than one library as long as they 
all have a structure adaptable to representation in the 
consensus structure determination step (see infra) , From 

30 distance measurements, step 4, a few strategically chosen 
distances are accurately known. This information is 
heuristically utilized along with an accurate model of the 
physical atomic interactions and the allowed molecular 
conformations . 

35 Further, these means are particularly adapted for 

determining structures of molecules having limited 
conformational degrees of freedom at the temperature of 
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interest and conf ormationally constrained by, e.g., internal 
bonds. Potential conformations are generated and selected by 
smart configuration bias techniques which avoid generation of 
unnecessarily improbable new conformations. (Hence, a 
5 "configuration bias" method.) The technique is preferably 
applied herein to conf ormationally constrained peptides. A 
concerted rotation technique is combined with conf igurational 
bias conformation generation so that new conformations 
automatically preserve the internally linked backbone 
10 structure constraints. This technique is preferably applied 
to the preferred constrained peptide library, of a sequence 
comprising CX,C (wherein X is any amino acid) . The technique 
is also applicable t-o other constrained peptide libraries, to 

peptoid libraries, and to any more^ general organ_ic_diversity, 

- 15- 1-ibrarie-s that ¥eet "certain geometric limitations (i.e., that 
have structures adaptable to representation in the consensus 
structure determination step (see infra) ) . 

The methods of the invention are not limited to the use 
of CCBMC for determining a consensus pharmacophore structure. 
20 Alternative embodiments of this invention may use alternative 
structure determination methods to determine a consensus 
pharmacophore structure. For example, a simple yet expensive 
method is to make exhaustive REDOR NMR measurements 
characterizing the candidate pharmacophore in each binder and 
25 then average these measurements. A somewhat less expensive 
method is to use a conventional Monte Carlo molecular 
structure determination method to limit somewhat the number 
of REDOR NMR measurements required to characterize the 
candidate pharmacophore. Conventional Monte Carlo methods. 
30 being unable to directly make use of partial distance 
measurements or consensus binding information, are less 
efficient than the CCBMC method and require more distance 
measurements. Further, other known techniques of molecular 
structure determination, for example folding rules or 
35 molecular dynamics, can be used in place of conventional 
Monte Carlo. 
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The success of the structure determination is tested, 
step 6, against various convergence and success criteria. 
Consistency tests, step 6, are applied to the resulting 
structure to determine whether the candidate pharmacophore 
5 previously selected is the actual pharmacophore. One set of 
tests checks predicted distances against new distance 
measurements or against previous measurements temporarily not 
used as structure constraints. A second set of tests checks 
heuristically whether the candidate pharmacophore exhibits 
10 the expected low energy consensus structure. The test are 
described further below. If a shared structure is found, the 
candidate pharmacophore must be the actual pharmacophore. if 
not, another candidate pharmacophore and another shared 
structure is determined, if possible. An actual 
15 pharmacophore exists and will eventually be found and 
accurately structured. 

Upon passing these tests, the methods of the invention 
have provided a consensus structure for the selected 
candidate pharmacophore, preferably accurate to at least 
20 approximately 0.25-0.50 A, as well as structures for the 
remainder of the binder molecules. Lead compound selection, 
step 7, uses these structures to determine or select highly 
targeted lead compounds 8. One method of lead selection is 
to- design new organic molecules of pharmacologic utility with 
25 the determined pharmacophore structure. Another method 
selects leads from databases of molecular descriptions. 
Conventionally known to medicinal chemists are databases of 
potential drug compounds indexed by their significant 
chemical and geometric structure (e.g.. the Standard Drugs 
30 File (Derwent Publications Ltd., London, England), the 

Bielstein database (Bielstein Information, Frankfurt, Germany 
or Chicago) , and the Chemical Registry database (CAS, 
Columbus, Ohio)). The determined pharmacophore, being a 
chemical and geometric structure in the preferred embodiment, 
35 is used to query such a database. Search results will be 
those compounds with homologous chemical groups arrayed in a 
very closely similar geometric arrangement. These are lead 
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compounds 8 output from this invention and input to the 
process of drug testing and development. 

Although the preferred identity and ordering of the 
method steps is presented in Fig. 1, the invention is not 
5 limited to this identity and ordering. Other orderings, 
especially of steps 3, 4, and 5, are possible to achieve 
certain efficiencies. Steps can be inserted and deleted, for 
optimal effect. For example, an additional partial structure 
determination step can be inserted between existing steps 3 
10 and 4 to provide information on how best to make the step 4 
strategic measurements. As another example, in an 
alternative aspect, in lieu of screening one or more 
libraries to select binders, predetermined binders can be 
obtained and used (e.g., binders determined^by any means^to _ 
15 be speciTic to^ the same target molecule); thus, step 2 can be 
omitted. In another embodiment, step 4, the measurement 
step, can be omitted. While all method steps in the 
preferred embodiment assume an aqueous environment at body 
temperature (37 , to the extent these parameters are 
20 relevant to the particular step, the invention is not limited 
to human environmental parameters . 

Screening against a diversity library consists of 
selecting by assay those library members which bind 
specifically to the target molecule of interest. Binding 
25 specificity is preferably a binding constant of less than 1 
fim (micromolar) , and more preferably less than 100 nm 
(nanomolar) . Preferably, an assay is done that detects an 
effect of binding of the binder to the target molecule on the 
target molecule's biological activity, to ensure that the 
30 binding is actually to the biological target of interest. 
Also, preferably, the selected binders are tested to further 
select those binders that bind to- the target molecule 
competitively, to ensure that each binds to the same target 
in the target molecule. 
35 The output of the screening step is a number, N, of 

binders selected from one or more libraries for use by the 
subsequent steps of the method. The binders with highest 
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affinity are preferably selected for use by the subsequent 
steps. The chemical- structure of each of the N binder- 
selected for use is determined as part of the member 
synthesis and library screening. The primary chemical 
5 structure of the preferred constrained peptide library is 
specified by the amino acid sequence of the -X,- portion of 
the CXeC molecule. For more general organic diversity 
Ubrar.es. the selection and arrangement of library building 
blocks in the binders must be determined. 
10 It is a preferred aspect of this invention that the set 

Of determined lead compounds is selective and small. Example 
1 Illustrates that as pharmacophore distance tolerances are 
relaxed, the number of compounds retrieved by drug database 
searches increases geometrically^ As this invention^ 
IS^determxneshigr resolution pharmacophore geometries, it can 
be expected that database searches, or other methods of 
determining leads from pharmacophore structure, will return 
only a few, selective, targeted leads. Methods limiting the 
number of leads decrease the cost of drug development and are 
20 consequently of considerable utility to the pharmaceutical 
industry and medical community. The expense of developing 
and evaluating lead compounds for biological effect and 
medicinal usefulness is well known. Each lead compound must 
be screened for pharmacological usefulness, efficacy, and 
25 safety, often chemical modifications are required and the 
process must be repeated. Finally, the required in vivo 
Pharmacologic toxicity and clinical trials alone can consume 
years of time and millions of dollars. 

Therefore, starting with a target molecule 1 having a 
30 biologically or pharmacologically interesting target the 

IT^I °' determines a ;onsensus 

pharmacophore structure. This consensus pharmacophore 
str^:=ture can then be used to determine a selective set of 

35 of'n ' "^i-al design 

35 Of drugs, e.g., capable of acting as ligand-mimics (agonist! 
or antagonists) for the particular target molecule 
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In the following discussion and examples, each cf these 
steps will be more fully described. 

5.1- SELECTION OF A TARGET MOLECULE 
5 The target molecule is any one or more molecules 

containing a target or putative target of interest. The 
target is a binding interaction region. The target can be in 
a single molecule or can be a product of a molecular complex. 
The target can be a continuous or discontinuous binding 

10 region. The target molecule selected for use (Fig. 1, step 
1) is preferably any molecule that is found in vivo 
(preferably in mammals, most preferably in humans) and that 
has biological activity, preferably involved or put^tively 
involved in the onset, progression, or manif estation_of . a ^ _ ^ 

15 disease jpr ^disorder - The ^target molecule can also be a 
fragment or derivative of such an in vivo molecule, or a 
chemical entity that contains the same target as the in vivo 
molecule. Examples of such molecules are well known in the 
art. Such molecules can be of mammalian, human, viral, 

20 bacterial, or fungal origin, or from a pathogen, to give just 
some examples. The target molecule is preferably a protein 
or protein complex. The target molecules that can be used 
include but are not limited to receptors, ligands for 
receptors, antibodies or portions thereof {e.g.. Fab, Fab', 

25 F(ab')2, constant region), proteins or fragments thereof, 
nucleic acids, glycoproteins, polysaccharides, antigens, 
epitopes, cells and cellular components, subcellular 
particles, carbohydrates, enzymes, enzyme substrates, 
oncogenes (e.g., cellular, viral; oncogenes such as ras, raf, 

30 etc.), growth factors (e.g., epidermal growth factor, 

platelet -derived growth factor, fibroblast growth factor) , 
lectins, protein A, protein G, organic compounds, 
organometallic compounds, viruses, prions, viroids, lipids, 
fatty acids, lipopolysaccharides, peptides, cellular 

35 metabolites, steroids, vitamins, amino acids, sugars, 
lipoproteins, cytokines, lymphokines, hormones, T cell 
surface antigens {e.g., CD4, CDS, T cell antigen receptor), 
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ions, organic chemical groups, viral antigens (hepatitis B 
virus surface or core antigens, HIV antigens (e.g., gpl20, 
gp46)), hepatitis C virus antigens, toxins (e.g., bacterial 
toxins), cell wall components, platelet antigens (e.g., 
5 gpiibiiia) , cell surface proteins, cell adhesion molecules, 
neurotrophic factors, and neurotrophic factor receptors. 

In specific embodiments, vEGF (vascular endothelial 
growth factor) or KDR (the receptor for vEGF) (Terman et al . , 
1992, Biochem. Biophys. Res. Comm. 187:1579-1586) is the 

10 target molecule. vEGF and its receptor are the major 

regulators of vasculogenesis and angiogenesis (Millauer et 
al., 1993, Cell 72:835). Inhibition of the vEGF and the 
concomitant inhibition of its mitogenic activity and 
angiogenic capacity has been shown _tp ^suppress tumor^gr^wth^ 

15 iii ^vi vo -(Kendari^ etf ai . , 1993 , Proc , Natl . Acad . Sci . USA 
90:10705-10709; Kim et al . , 1993, Nature 362:841-844), Use 
of vEGF or KDR or portions thereof, as a target molecule is a 
preferred embodiment for use of the present invention to 
develop lead molecules as drugs in the area of cardiovascular 

20 disease or cancer. 

The proteins ras and raf, or portions thereof (e.g., 
modules functional portions), are also preferred target 
molecules, particularly in an embodiment wherein the methods 
of the present invention are employed to develop lead 

25 molecules for drugs that are cancer therapeutics, ras is a 
member of an intracellular signaling cascade that controls 
cell growth and differentiation (Cook and McCormick, 1994, 
Nature 369:361-362). ras functions in signal transduction by 
specifically recognizing the protein raf and bringing it to 

30 the cell membrane (Hall, 1994, Science 264:1413-1414; Vojtek 
et al., 1993, Cell 74:205-214), The recognition modules in 
both ras and raf have been determined (Zhang et al., 1993, 
Nature 364:308-313; Wame et al., 1993, Nature 364:352-355; 
and Vojtek et al., 1993, Cell 74:205-214); in a specific 

35 embodiment, such a recognition module is used as a target 
molecule according to the invention. 
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In another specific embodiment:, an integrin is used as a 
target molecule. Such molecules are known to function in 
clot formation, and can be used according to the present 
invention to develop lead molecules for drugs in the area of 
5 cardiovascular disorders. 

Target molecules for use can be obtained commercially 
(where the target is commercially available) , or can be 
synthesized or purified from natural or recombinant sources. 
In a specific embodiment, a target molecule is prepared that 

10 has been modified to incorporate an "affinity tag," i.e., a 
structure that specifically binds to a known binding partner, 
to facilitate recovery/isolation/immobilization of the target 
molecule. In a preferred aspect, recombinant expression 
methods well known in the art can be used to produce a__ _ = 

15 protein^target molecule^as a^ fusid^ protein, incorporating a 
peptide affinity tag. Such affinity tags include but are not 
limited to epitopes of known antibodies (e.g., c-myc epitope 
(Evan et al . , 1985, Mol . Cell. Biol. 5:3610-3616)), a series 
(e.g., 5-7) of his residues (which bind to zinc), maltose 

20 binding sequences such as pmal, etc. Tags are incorporated 
into protein targets at either the amino or carboxy- terminus . 
In another embodiment, the target is chemically attached to a 
tag (e.g, biotin (which binds to avidin, streptavidin) , 
streptavidin) , e.g., by biotinylation . 

25 The target molecule is purified by standard methods. 

For example, a protein target can be purified by standard 
methods including chromatography (e.g., ion exchange, 
affinity, and sizing column chromatography), centrifugation, 
differential solubility, or by any other standard technique 

30 for the purification of proteins; in a preferred embodiment, 
reverse phase HPLC (high performance liquid chromatography) 
is employed. 

Once the target molecule has been purified, it is 
preferably tested to ensure that it retains its biological 
35 activity (and thus retains its native conformation) . Any 

suitable in vitro or in vivo assay can be us d. In instances 
where the desired target molecule is a fragment or derivative 
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Of a ^ol^cule found in vivo, or is a chemical entity 
putatively containing the same target as a mr.,. , . 
wvo. it is highi. p„*e„ea that L^inVr e":/:"": 
aesared target molecules prior to- their use so 

' :::LgtT:cti:f:; ::r ::■ C rie 

.no™ ugand to thl 11 t^olru-^r reUctT:" ^ 
use as target molecules according to the 
event that .iolt^ical activity .Vlll^ZZZ 
10 recombinant protein relative to the native £oL of the ' 

=:;ttsr:rr — ~ - ^ 

yeast, mammalian, or insect) 
with a variety of taa«; »nH i msect) and/or 

y or cags and location of taqs fnn «,--u 

ammo- or carboxy- terminal side) in ordeft 
15 achieve or to r.r.^ ■ ■ ^ ^° attempt to 

eve, or to optxmaze, recovery o^^^^^^^^ ^ ^ 

According to a preferred embodiment of th. • 
diversity libraries are screened to select binde n""' 
20 specifically bind to the target molecule O te luy^^" 

:e^:an; \r "^-^ ^ - --^^^^^ 

^e^ers and -T^ilatL^hTpli::,,^^^^^^^^^ 
members are represent^H i-k ^iiity that all possible 

represented, the more oref *.i-r-o,^ *.u -. 
25 preferred embodiments th. HS . ^''^^^^^^^ ^^^^ library. m 

«n "i^r^itirr" i": - - - 

30 using standard J^^'^^^^ l^^^''^ 
recombinant expression libraries or^olv libraries, 
exemplar, types of ...r^rZ Z.^ZTT^ 

co„st:a\:ed~i::r::::;r;'':j^^^^^ — - ^ 

35 structural rigiditv) p 7 ^^"^ °f 

rigiaityj . Examples of const-T-air,«^ -i 
described below a lin.«^ constraxned libraries are 

eiow. A linear, or nonconstrained library, is 
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less preferred although it may be used. Additionally, one or 
more different libraries can be screened to select binders. 

In a preferred embodiment, the library contains peptide 
or peptide analogs having a length in the range of 5-18 amino 
5 acids or analogs thereof in each library member. 

m specific embodiments, binders are identified from a 
random peptide expression library or a chemically synthesized 
random peptide library. The term "random" peptide libraries 
IS meant to include within its scope libraries of both 
10 partially and totally random (variant) peptides. 

In one embodiment, the peptide libraries used in the 
present invention may be libraries that are chemically 
synthesized in vitro. Examples of such libraries are given 
in Fodor et al . , 1991, Science 251:767-773, which describes 

f ^"i^!^315t l known^_array ..of. shor-t-peptides oh^ an = " 

individual microscopic slide; Houghten et al., 1991, Nature 
354:84-86, which describes mixtures of free hexapeptides in 
which the first and second residues in each peptide were 
individually and specifically defined; Lam et al 1991 
20 Nature 354:82-84, which describes a "one bead, one peptide" 
approach in which a solid phase split synthesis scheme 
produced a library of peptides in which each bead in the 
collection had immobilized thereon a single, random sequence 
of amino acid residues; Medynski, 1994, Bio/Technology 
25 12:709-710, which describes split synthesis and T-bag 
synthesis methods; and Gallop et al., 1994, J. Medicinal 
Chemistry 37(9) :1233-12S1. Simply by way of other examples 
a combinatorial library may be prepared for use. according to 
the methods of Ohlmeyer et al., 1993. Proc. Natl. Acad Sci 
30 USA 90:10922-10926; Erb et al . . 1994. Proc. Natl. Acad Sci' 
USA 91:11422-11426; Houghten et al . . 1992, Biotechniques 
13:412; Jayawickreme et al.. 1994. Proc. Natl. Acad. Sci USA 
91:1614-1618; or Salmon et al.. 1993. Proc. Natl. Acad Sci 
USA 90:11708-11712. PCT Publication No. WO 93/20242 and 
35 Brenner and Lemer. 1992, Proc. Natl. Acad. Sci. USA 
89:5381-5383 describe "encoded combinatorial chemical 
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libraries," that contain oligonucleotide identifiers for each 
chemical polymer library member. 

In another embodiment, biological random peptide 
libraries are used to identify a binder which binds to a 
5 target molecule of choice. Many suitable biological random 
peptide libraries are knovm in the art and can be used or can 
be constructed and used to screen for a binder that binds to 
a target molecule, according to standard methods commonly 
known in the art . 
10 According to this approach, involving recombinant DNA 

techniques, peptides are expressed in biological systems as 
either soluble fusion proteins or viral capsid fusion 
proteins . 

In a specific embodiment, a phage display -14brary;^^in~~ 

15_ which^ the-protein^of^rnterest is expressed as a fusion 
protein on the surface of a bacteriophage, is used {see, 
e.g., Smith, 1985, Science 228:1315-1317). A number of 
peptide libraries according to this approach have used the 
M13 phage. Although the N-terminus of the viral capsid 

20 protein, protein III <PIII), has been shown to be necessary 
for viral infection, the extreme N-terminus of the mature 
protein does tolerate alterations such as insertions. The 
protein PVIII is a major M13 viral capsid protein, which can 
also serve as a site for expressing peptides on the surface 

25 of M13 viral particles, in the construction of phage display 
libraries . Other phage such as lambda have been shown also 
to be able to display peptides or proteins on their surface 
and allow selection; these vectors may also be suitable for 
use in production of libraries (Sternberg and Hoess, 1995, 

30 Proc, Natl- Acad. Sci. USA 92:1609-1613). 

Various random peptide libraries, in which the diverse 
peptides are expressed as phage fusion proteins, are known in 
the art and can be used. Examples of such libraries are 
described below. 

35 Scott and Smith, 1990, Science 249:386-390 describe 

construction and expression of a library of hexapeptides on 
the surface of M13. The library was made by inserting a 33 
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base pair Bgl 1 digested oligonucleotide sequence into an Sfi 
I digested phage fd-tet, i.e.. fUSE5 RF. The 33 base pair 
fragment contains a random or "degenerate" coding sequence 
(NNK), where N represents G, A, T or C and K represents G or 
5 T. Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 87: 6378- 
6382 also described a library of hexapeptides expressed as 
PIII gene fusions of M13 fd phage. PCT publication WO 
91/19818 dated December 26, 1991 by Dower and Cwirla 
describes a library of pentameric to octameric random amino 
10 acid sequences. 

Devlin et al., 1990, Science, 249:404-406, describes a 
peptide library of about 15 residues generated using an (NNS) 
coding scheme for oligonucleotide synthesis in which S is G 



or C 

15 



-Chr=istian^ and colleagues have described a phage display 
library, expressing decapeptides (Christian, R.B., et al 
1992, J. Mol. Biol. 227:711-718). The DNA of the ' library ' was 
constructed by use of an oligonucleotide comprising the 
degenerate codons (NN(G/T)]„ (SEQ ID NO: 8) with a self- 
20 complementary 3' terminus. This sequence forms a hairpin 
which creates a self-priming replication site that was used 
by T4 DNA polymerase to generate the complementary strand 
The double-stranded DNA was cleaved at the Sfil sites at the 
5' terminus and hairpin for cloning into the fUSE5 vector 
25 described by Scott and Smith, supra. 

Lenstra, 1992, J. Immunol. Meth. 152:149-157 describes a 
library that was constructed by annealing oligonucleotides of 
about 17 or 23 degenerate bases with an 8 nucleotide long 
palindromic sequence at their 3- ends. This resulted in the 
30 expression of random hexa- or octa-peptides as fusion 
proteins with the ^-galactosidase protein in a bacterial 
expression vector. The DNA was then converted into a double - 
stranded form with Klenow DNA polymerase, blunt-end ligated 
into a vector, and then released as Hind in fragments 
35 These fragments were then cloned into an expression vector at 
the sequence encoding the C-terminus of a truncated 
^-galactosidase to generate 10' recombinants . 
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Kay et al . , 1993. Gene 128:59-65 describes a random 38 
amino acid peptide phage display library. 

PCT Publication No. WO 94/18318 dated August 18. 1994 
describes random peptide phage display "TSAR libraries" that 
5 can be used. 

Other biological peptide libraries which can be used 
include those described in U.S. Patent No. 5,270,170 dated 
December 14, 1993 and PCT Publication No. WO 91/19818 dated 
December 26, 1991. 

10 In a specific embodiment, a "peptide-on-plasmid" 

library, containing random peptides fused to a DNA binding 
protein that links the peptides to the plasmids encoding 
them, can be used (Cull et a-1 . . 1992, Proc. Natl. Acad. Sci 
USA 89:1865-1869) . 

_15 Another alte-rnarive' ti phage "drsplay^or chemically ~ 

synthesized libraries is a polysome -based library, which is 
based on the direct in vitro expression of the peptides of 
interest by an in vitro trahslation system (in some 
instances, coupled to an in vitro transcription system). 
20 These methods rely on polysomes to translate the genomic 
information (in this case encoded by an mRNA molecule, in 
some instances made in vitro by transcription from synthetic 
DNA) (see, e.g., Korman et al., 1982, Proc. Natl. Acad. Sci 
USA 79:1844-1848) . Such in vitro translation-based libraries 
25 include but are not limited to those described in PCT 
Publication No. WO 91/05058 dated April 18. 1991; and 
Mattheakis et al., 1994, Proc, Natl. Acad. Sci. USA 
91:9022-9026. 

Diversity library screening, step 2 of Pig. 1, 
30 determines a few. N, members (compounds) from one or more 
librajpies and their primary sequences all of which 
specifically bind to target molecule 1 in a similar manner 
A structured organic diversity library is a prescription for 
the creation of a huge number of related molecules all built 
35 from combinations of a small number of chemical building 
blocks. Preferr d diversity libraries for use according to 
the invention have members whose binding to a target molecule 
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is characterized by conf igurational entropy change that are 
relatively small to the binding energy. This means that 
library members have definite structures in the bound and, 
especially, the unbound states. A preferred example of a 
'5 chemical diversity library for use in the invention contains 
short peptides with a constrained conformation. Short 
peptides without constrained conformations are often freely 
flexible in an aqueous environment and adopt no fixed unbound 
structure. The binding of such library members is 

10 complicated by significant conf igurational entropy changes. 
To eliminate this complication, it is preferred that all 
library members have a constrained structure and bind to the 
target molecule in a specific and identifiable manner. One 
method of achieving constrained conformation is to require 

15- internal- tinkingv ^such^a^ byndisiTiHde^orid^^ 

In one embodiment, disulfide bond formation is achieved 
by use of libraries that contain peptides having a pair of 
invariant cysteine residues, preferably positioned in the 
range of 2-16 residues apart, most preferably 6-8 residues 

20 apart, that cross-link in an oxidizing environment to form 
cystines (disulfide bonds between cysteines) . An example of 
such libraries are those containing or expressing peptides of 
the form R^CX„CR^ wherein is a sequence of 0-10 amino acids, 
C is cysteine, is a sequence of n variant amino acids 

25 (e.g., if all 20 classical amino acids are represented, X 
means any one of the 20 classical amino acids) ; n is an 
integer ranging from 2 to 16; and R^ is a sequence of 0-10 
amino acids. R^ and R^ can contain invariant or variant amino 
acids. Another example is such libraries are those 

30 containing or expressing peptides of the form R^CX„R^ where 
R / X, n, and R^ are as described above; n is preferably 8 or 
9. A preferred constrained peptide library, of at least 10^ 
members, consists of peptides comprising the sequence CX«C 
(SEQ ID N0:1), wherein C is cysteine, X is any naturally 

35 occurring amino acid, and a disulfide bond is formed between 
the two cysteines. Additional invariant amino acids (e.g., 
preferably no more than 5-10 amino acids) on either the 
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amino- or carboxy- terminus of CX^C can be incorporated as part 
of the peptide in this preferred embodiment. Fig. 10 
schematically illustrates such a molecule. The disulfide 
bridge between the two cysteines acts as a sufficient 
5 conformational constraint for the preferred practice of this 
invention. By way of example, the library is constructed by 
generating oligonucleotides with the desired degeneracy to 
code for the peptides and ligating them into vectors of 
choice. These inserted oligonucleotides are suitable for 
10 both use in in vivo genetic expression systems exemplified by 
phage display, or in vitro translation methods based on 
coupled transcription and translation from DNA of interest 
(see below) . The creation and use of an exemplary library is 
described in Section 6.3 hereinbelow. The invention is 
15 easily and readily, adaptable to other alternative peptide 
libraries which include short peptides with alternative 
disulfide scaffolding, for example, comprising the sequence 
CXnCX„CC with two disulfide bridges, wherein n and m are each 
independently an integer in the range of 2-10, and X is any 
20 amino acid. More generally, any peptide library containing 
members of definite conformation which bind to a target 
molecule in a specific and identifiable manner may be used. 

Further, more general, structurally constrained, organic 
diversity (e.g., nonpeptide) libraries, can also be used. By 
25 way of example, a benzodiazepine library (see e.g., Bunin et 
al., 1994, Proc. Natl. Acad. Sci . USA 91:4708-4712) may be 
adapted for use. 

Constrained libraries that can be used are also known in 
the art. For example, PCT Publication No. WO 94/18318 dated 
30 August 18, 1994 describes semirigid phage display libraries, 
in which the plurality' of expressed peptides can adopt only a 
single or a small number of conformations. Examples of such 
libraries have a pair of invariant cysteine residues 
positioned in or flanking random residues which, when 
35 expressed in an oxidizing environment, are most likely cross- 
linked by disulfide binds to form cystines. Also disclosed 
are libraries having a cloverleaf structure by appropriate 
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arrangement of cysteine residues. Also disclosed are 
libraries with peptides having invariant cysteine and 
histidine residues positioned within the random residues, or 
invariant histidines alone within the random residues. 
5 TSAR- 13 and TSAR- 14 are exemplary semirigid libraries 
disclosed therein. 

Other conformationally constrained libraries that can be 
used include but are not limited to those containing modified 
peptides (e.g.. incorporating fluorine, metals, isotopic 
10 labels, are phosphorylated, etc.), peptides containing one or 
more non-naturally occurring amino acids, non-peptide 
structures, and peptides containing a significant fraction of 
T-carboxyglutamic acid. 

As stated above, libraries of non-peptides, e.g., 
^ IS^peptide^der^ivatives (for eTcimp^^^ 

non-naturally occurring amino acids) can also be used. One 
example of these are peptoid libraries (Simon et al . , 1992. 
Proc. Natl. Acad. Sci. USA 89:9367-9371). Peptoids are 
polymers of non-natural amino acids that have naturally 
20 occurring side chains attached not to the alpha carbon but to 
the backbone amino nitrogen. Since peptoids are not easily 
degraded by human digestive enzymes, they are advantageously 
more easily adaptable to drug use. Another example of a 
library that can be used, in which the amide functionalities 
25 in peptides have been permethylated to generate a chemically 
transformed combinatorial library, is described by Ostresh et 
al., 1994, Proc. Natl. Acad. Sci. USA 91:11138-11142). 

The peptide or peptide portions of members of the 
libraries that can be screened according to the invention are 
30 not limited to containing the 20 naturally occurring amino 
acids. In particular, chemically synthesized libraries and 
polysome based libraries allow the use of amino acids in 
addition to the 20 naturally occurring amino acids (by their 
inclusion in the precursor piool of amino acids used in 
35 library production) . In specific embodiments, the library 
members contain one or more non -natural or non-classical 
amino acids or cyclic peptides. Non-classical amino acids 
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include buc are not limited to the D-isomers of the con^mor 
amino acids, cy-amino isobutyric acid. 4 -aminobutyric acid 
Abu. 2-amino butyric acid; 7-Abu. e-Ahx, 6-amino hexanoic 
acid; Aib. 2-amino isobutyric acid; 3-amino propionic acid- 
5 ornithine; norleucine; norvaline, hydroxyproline . sarcosine 
citrulline, cysteic acid, t-butylglycine, t-butylalanine 
Phenylglycine. cyclohexyl alanine. B-alanine. designer amino 
acids such as S-methyl amino acids. Ca-methyl amino acids 
Na-methyl amino acids, fluoro-amino acids and amino acid ' 
10 analogs in general. Furthermore, the amino acid can be D 
(dextrorotary) or L (levorotary) . 

By way of example, the incorporation of non-stanrt,.^ or 
n>odified amino acids into libraries can be done by taking 
advantage of concurrent development in^reassigning the ^ ^ 
15 genetic code (Noren et al . , 1989, Science 244:182-188- 

Benner, 1994, Trend. BioTech. 12:158-163) and the charaino o' 
specific tRNAs with the desired amino-acid (Cornish et^al" 
1994. Proc. Natl. Acad. Sci . USA 91:2910-2914). See also ' 
Ibba and HennecJce, 1994. Bio/Technology 12:678-682 
20 (particularly Table I), and references cited therein. These 
pre-charged tRNAs are then utilized in the in .ritro 
translation system to incorporate the non-standard amino acid 
into the library of choice. The position of incorporation 
can be either random (variant) or defined (invariant) The 
25 defined case can be chosen to maximize the utility of the 
resulting placement of the non-natural functional group to 
maximize either binding properties or the ability to perform 
structural measurements. Similar techniques may be used to 
incorporate non-standard amino acids into the peptides 
30 In a specific embodiment, an iterative approach to 

library construction can be talcen, as structural information 
on the mode of binding to a given target is obtained. For 
example, information from structural analysis can be used to 
make libraries with library members containing chemical 
35 backbones that match known chemical scaffolds, enhance 

solubility or membrane permeability, reduce effect of water 
on structure, and incorporate other physical parameters 
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suggested by structural analysis. Use of algorithmically 
optimized library inserts can be used to increase the chances 
of finding binders of interest (see e.g.. Arkin and Youvan, 
1992, Bio/Technology 10:297-300) . 
5 In other embodiments, the following can be used to 

improve library use in both phage and bacterial systems: 
production of libraries in bacteria which overproduce the 
chape ronins GroES and GroEL (Soderlind et al., 1993, 
Bio/Technology 11:503-507), and production in E. coli strains 

10 which prevent degradation in the periplasmic space (Strauch 
and Beckwith, 1988, Proc. Natl. Acad. Sci . USA 85:1576-1580; 
Lipinska et al . , 1989, J. Bacteriology 171:1574-1584). 
Purified cofactors such as GroES and GroEL could also be 
directly added to an in vitro expression and selection 

15 system. ^ 



5.3. SCREENING OF DIVERSITY MBRARIES 

Once a suitable diversity library has been constructed 
(or otherwise obtained) , the library is screened to identify 

20 binders having binding affinity for the target. Screening is 
done by contacting the diversity library members with the 
target molecule under conditions conducive to binding and 
then identifying the member (s) which bind to the target 
molecule. Screening the libraries can be accomplished by any 

25 of a variety of commonly known methods. See, e.g., the 
following references, which disclose screening of peptide 
libraries: Parmley and Smith, 1989, Adv. Exp. Med. Biol. 
251:215-218; Scott and Smith, 1990, Science 249:386-390; 
Fowlkes et al., 1992; BioTechniques 13:422-427; Oldenburg et 

30 al., 1992, Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et 
al., 1994, Cell 76:933-945; Staudt et al . , 1988, Science 
241:577-580; Bock et al . , 1992, Nature 355:564-566; Tuerk et 
al., 1992, Proc. Natl. Acad. Sci. USA 89:6988-6992; Ellington 
et al., 1992, Nature 355:850-852; U.S. Patent No. 5,056,815, 

35 U.S. Patent No. 5,223.409, and U.S. Patent No. 5,198,346, all 
to Ladner et al.; Rebar and Pabo, 1993, Science 263:671-673; 
and PCT Publication No. WO 94/18318. See also the references 
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cited in section 5.2 hereinabove (disclosing libraries) 
regarding methods for screening. 

screening can be carried out by contacting the library 
members with an immobilized target molecule and harvesting 
5 those library members that bind to the target. Examples of 
such screening methods, termed "panning" techniques are 
described by way of example in Parmley and Smith, 1988, Gene 
73:305-318; Fowlkes et al . , 1992. BioTechniques 13:422-427- 
PCT Publication No. WO 94/18318; and in references cited ' 
10 hereinabove. In panning methods that can be used to screen 
the libraries, the target molecule can be immobilized on 
plates, beads, such as magnetic beads, eepharose. etc., or or 
beads used in columns. m particular embodiments, the 
immobilized target molecule has incorporated an "affinity 
, li tag.2_as,d_escr.ibed--aboveT which-cah-b-e used to""e7flct" ' ' " 
immobilization by attaching the tag's binding partner to the 
desired solid phase. 

in one embodiment, the primary method of selecting from 
libraries is the use of solid phase plastic affinity capture 
20 to immobilize the target molecule prior to its use in the 
selection (screening) process. This method can be improved 
upon to increase throughput, selectivity and specificity 
solid phase plastic supports can be replaced with magnetic 

25 ZTll P'^^^-^^^^^ ^y^'--^' 1-ge beads can be used, 

25 but these are not believed to be suitable, due to steric 
hindrance, for use in bacterial systems. This steric 
hindrance can be avoided by using high gradient magnetic cell 
separation with small particles («o.5m.) (Miltenyfet al 
1990, Cytometry 11:231-238) . 
30 In a specific embodiment involving the use of a peptide 

Phage display library, selection of a binder protein 
IZZIT " ^ bacteriophage thus selects both 

3S ta™.^ , , particle) . Following binding between the 

is i^b!Tr " "-'"-"-9" ">=l«ule complex 

« i™«.bili«d. and „e amplified, e.^.. by infecting / coli 
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and propagating each isolated binding phage. Repeating this 
process of affinity capture and amplification allows those 
peptides which bind with the highest affinity to the target 
molecule to be selectively enriched from the original, * 
5 library. 

In one particular embodiment, presented by way of 
example but not limitation, a phage display library can be 
screened as follows using magnetic beads (see PCT Publication 
No. WO 94/18318) : 
10 Target molecules are conjugated to magnetic 

beads, according to the instructions of the 
manufacturers. The beads are incubated with excess 
bovine serum albumin (BSA) , to block non-specific 
binding. The J>eadj are yien_washed_wi.th numerous - - — - 
15 cycles of suspension in phosphate buffered saline 

(PBS) with 0.05% Tween® 20 and recovered by drawing 
a strong magnet along the sides of a plastic tube. 
The beads are then stored under refrigeration, 
until use. 

20 An aliquot of a library is mixed with a sample 

of resuspended beads, at 4^C for a. time period in 
the range of 2-24 hrs. The magnetic beads are then 
recovered with a strong magnet and the liquid is 
removed by aspiration. The beads are then washed 

25 by resuspension in PBS with 0.05% Tween® 20, and 

then drawing the beads to the tube wall with the 
magnet. The contents of the tube are removed and 
washing is repeated 5-10 additional times. 50 mM 
glycine-HCl (pH 2.0), 100 ^lq/ml BSA solution is 

30 added to the washed beads to denature proteins and 

release bound phage. After a short incubation, the 
beads are drawn to the side of the tubes with a 
strong magnet, and the liquid contents are then 
transferred to clean tubes. 1 M Tris-HCl (pH 7.5) 

35 or 1 M NaHjPO^ (pH 7) is added to the tubes to 

neutralize the pH of the phage sample. The phage 
are then diluted, e.g., 10"* to 10**, and aliquots 
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plated with E, colx DHSaF' cells to determine the 
number of plaque forming units of the sample . In 
certain cases, the platings are done in the 
presence of XGal and IPTG for color discrimination 
5 of plaques {i.e., Iac2+ plaques are blue, lacZ- 

plaques are white) . The titer of the input samples 
is also determined for comparison. 

Alternatively, as yet another non-limiting example, 
screening a diversity library of phage expressing peptides 

10 can be achieved by panning using microtiter plates (see PCT 
Publication No. WO 94/18318) as follows: 

The target molecule is diluted and a small 
aliquot of target molecule solution is adsorbed 
onto wells of microtiter plates (e.g. by incubation 

15 overnight at 4**C_) An aliquot of BS A solution^ (1 

mg/ml, in 100 mM NaHCOj, pH 8.5) is added and the 
plate incubated at room temperature for 1 hr. The 
contents of the microtiter plate are f lic)ced out 
and the wells washed carefully with PBS-0.05% 

20 Tween® 20. The plates are repeatedly washed free 

of unbound target molecules, A small aliquot of 
phage solution is introduced into each well and the 
wells are incubated at room temperature for 2-24 
hrs. The contents of microtiter plates are flicked 

25 out and washed repeatedly. The plates are 

incubated with wash solution in each well for 20 
minutes at room temperature to allow bound phage 
with rapid dissociation constants to be released. 
The wells are then washed five more times to remove 

30 all unbound phage. 

To recover the phage bound to the wells, a pH 
change is used. An aliquot of 50 mM glycine-HCl 
(pH 2.0), 100 fxg/ml BSA solution is added to the 
washed wells to denature proteins and release bound 

35 phage. After 10 minutes at €5"C, the contents are 

then transferred into clean tubes, and a small 
aliquot of 1 M Tris-HCl (pH 7.5) or IM NaH^PO^ (pH 
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7) is added to neutralize the pH of the phage 
sample. The phage are then diluted, e.g., lO"* to 
10-* and aliquots plated with E. coli DHSaF' cells 
to determine the number of the plaque forming units 
5 of the sample. In certain cases, the platings are 

done in the presence of XGal and IPTG for color 
discrimination of plaques {i.e.. lacZ+ plaques are 
blue, JacZ- plaques are white). The titer of the 
input samples is also determined for comparison 
10 (dilutions are generally 10** to lO'*) . 

By way of another example, diversity libraries 
expressing peptides as a surface protein of either a particle 
or a host cell, e.g., phage or bacterial cell, can be 
screened by passing a solution of the library over a column 
J^J^^^ ^arge^t^molecule --immobilized to a 'solid mitrix; "such""as^ 
sepharose, silica, etc., and recovering those particles or 
host cells that bind to the column after washing and elution. 

In yet another embodiment, screening a library can be 
performed by using a method comprising a first "enrichment" 
20 step and a second filter lift step as described in PCT 
Publication No. WO 94/18318. 

Several rounds of serial screening are preferably 
conducted. in a particularly preferred aspect, each round is 
varied slightly, e.g.. by changing the solid phase on which 
25 immobilization occurs, or by changing the method of 

immobilization on (e.g.. by changing the linker to) the solid 
phase. When using a phage display library, the recovered 
cells are then preferably plated at a low density to yield 
isolated colonies for individual analysis. By way of 
30 example, the following is done: The individual colonies are 
selected, grown and used to inoculate LB culture medium 
containing ampicillin. After overnight culture at 37^0, the 
cultures are then spun down by centrifugation. Individual 
cell aliquots are then retested for binding to the target 
35 molecule attached to the beads. Binding to other beads, 

having attached thereto a non-rel vant molecule, can be'used 
as a negative control. 
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In a specific embodiment, different rounds of screening 
can respectively involve selection against targets in 
primarily their purified form, and then in their natural 
state (e*g., on the surface of a mammalian cell) (see, e.g., 
5 Marks et al . . 1993, Bio/Technology 11:1145-1149, describing 
selection against cell surface blood group antigens) • 

In other examples, subsequent rounds of screening can 
involve immobilization of the target molecule by attachment 
at different ends (e-g. , amino or carboxy- terminus) of the 

10 target molecule to a solid support, or presentation of 

library members by attachment to or fusion at different ends 
of the library members. 

By way of other examples of screening methods that can 
be used, genetic selection methods can be -adapted for 

15 screening of libraries, or can be used in a recursive scheme. 
Thus, in a specific aspect, the invention provides screening 
methods in which methods allowing high throughput and 
diversity screening (e.g., screening phage display or 
polysome libraries against a ligand) are utilized in initial 

20 rounds, with subsequent rounds employing a genetic selection 
technique, in which the presence of a binder of appropriate 
specificity increases the activity of or activation of a 
transcriptional promoter or origin of replication. Genetic 
selection techniques that can be adapted for use (e.g., by 

25 inserting random oligonucleotides in the test plasmid) 
include the two-hybrid system for selecting interacting 
proteins in yeast, replicative based systems in mammalian 
cells, and others (see, e.g., Fields & Song, 1989, Nature 
340:246-246; Chien et al., 1991, Proc. Natl. Acad. Sci. USA 

30 88:9578-9582; Vasavada et al., 1991, Proc. Natl, Acad. Sci. 
USA 88:10686-10690), Thus, in a specific embodiment, 
compounds are produced as fusion proteins, aind contacted with 
a different fusion protein comprising a target fused to 
another molecule, in which specific binding of the fusion 

35 proteins to each other results in an increase in activity or 
activation of a transcriptional promoter or an origin of 
replication. In a specific embodiment, a genetic selection 
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method is used in a later round of screening to either select 
directly for a library member that binds to a target 
molecule, or to select a library member that competitively 
inhibits binding of a ligand to -the target molecule. 
5 Several exemplary methods for screening a phage /phagemid 

library are presented by way of example in Section 6.4 
hereinbelow. An exemplary method for screening a polysome - 
based library is presented in Section 6.3.3 hereinbelow. 

Once binders are selected from a diversity library which 
10 bind to a target molecule of interest, additional assays are 
preferably, although optionally, performed, including but not 
limited to those described below. Thus, in vivo or in vitro 
assays can be perforiued to test whether binding of a binder 
to the target molecule affects the targej: molecule Ls _ _ 
rs biological activiTy,~ binders that exert such an effect are 
preferred for use in subsequent steps of the invention. 
Alternatively, or in addition, competitive binding assays can 
be carried out to test whether the binder competes with other 
binders or with a natural ligand of the target molecule, for 
20 binding to the target molecule; binders that compete with 
each other, and that compete with the natural ligand, are 
preferably selected for use in subsequent steps of the 
invention. Alternatively, or in addition to the above 
assays, the binding affinity of binders for the target 
25 molecule is determined, by standard methods, or by way of 
example, as described in Section 6.5 infra. Binders of the 
highest affinity are preferred for use in subsequent steps of 
the invention. 



30 5.4. DETE RHINING THE SEQUENCE OR 

CHEMICAL FORMULA OV BINDERS 

Many of the references cited in Section 5.2 and 5.3 

hereinabove, which disclose library construction and/or 

screening, also disclose methods that can be used to 

35 determine the sequence or chemical formula of binders 

isolated from such libraries. By way of example, a nucleic 

acid which expresses a binder can be identified and recovered 
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from a peptide expression library or from a polysome -based 
library, and then sequenced to determine its nucleotide 
sequence and hence the deduced amino acid sequence that 
mediates binding. (In an instance wherein the sequence of an 
5 RNA is desired, cDNA is preferalDly made and sequenced*) 
Alternatively, the amino acid sequence of a binder can be 
determined by direct determination of the amino acid sequence 
of a peptide selected from a peptide library containing 
chemically synthesized peptides. In a less preferred aspect, 

10 direct amino acid sequencing of a binder selected from a 
peptide expression library can also be performed. 

Nucleotide sequence analysis can be carried out by any 
method known in the art, including but not limited to the 
method of Maxam and Gilbert (1980, Meth. Enzymol. €5:499- 

15 560), the Sanger dideo3cy jiej,hpd_( Sanger =et ai^, ^197 
^ NatirAcadTscr U.S.A. 74:5463), the use of T7 DNA 
polymerase (Tabor and Richardson, U.S. Patent No. 4,795,699; 
Sequenase'**, U.S. Biochemical Corp.), or Taq polymerase, or 
use of an automated DNA sequenator (e.g.. Applied Biosystems, 

20 Foster City, CA) . 

Direct determination of the chemical formulas of non- 
peptide or peptide binders can be carried out by methods well 
known in the art, including but not limited to mass 
spectrometry, NMR, infrared analysis, etc. 

25 In preferred aspects involving certain types of 

libraries well known in the art, sequencing or the use of 
known analytic techniques for chemical formula determination 
will not be necessary. In some such libraries, the identity 
and composition of each member of the library is uniquely 

30 specified by a label or *tag" which is physically associated 
with it and hence the compositions of those members that bind 
to a given target are specified directly (see, e.g., Ohlmeyer 
ct al., 1993, Proc. Natl- Acad. Sci. USA 90:10922-10926; 
Brenner et al., 1992, Proc. Natl. Acad. Sci. DSA 

35 89:5381-5383; Lemer et al., PCT Publication No. 

WO 93/20242) . In other examples of such libraries, the 
library members are created by step wise synthesis protocols 
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accompanied by complex record keeping, complex mixtures are 
screened, and deconvolution methods are used to elucidate 
which individual members were in the sets that had binding 
activity, and hence which synthesis steps produced the 
5 members and the composition of individual members (see, e.g., 
Erb et al., 1994, Proc. Natl. Acad. Sci. USA 91:11422-11426). 

Step 2 of the invention provides as output N binding 
library members (binders) and their sequences or chemical 
formulas . 

10 

5.5. CANDIDATE PHARMACOPHORE SELECTION 
The prior diversity library screening, step 2, 
determines a sec of size N of specifically binding members 
from one or more diversity libraries. Why e yie binders 

15 preferably but not necessarily isolated from one or more 

diversity libraries (e.g., binders need not be isolated from 
diversity libraries; known binders can be simply provided), 
the following description shall refer to the preferred 
embodiment wherein diversity library members are the binders. 

20 It will be apparent that the description is also readily 
applicable to binders that are not isolated from diversity 
libraries. 

The pharmacophore responsible for the library member 
binding is preferably determined by an overall select and 

25 test method in this and subsequent steps. In general, a 
pharmacophore is specified by the precise electronic 
properties on the surface of the binder that causes binding 
to the surface of the target molecule. In the preferred 
embodiment, these properties are specified by the underlying, 

30 causative, chemical structures. Chemical structures are 
specified generally by groups such as -CHa-, -COOK, and 
-CONHj. The preferred pharmacophore representation consists 
of a specification of the underlying chemical groups and 
their geometric relations. The more precisely the geometric 

35 relations are specified, the more preferred. In preferred 
but not limiting aspects, the geometric relations are precise 
to at least 0,50 A, and most preferably, at least 0.25 A. A 
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pharmacophore will usually comprise 2 to 4 of such groups, 
with 3 being typical. However, for complex protein 
recognition targets, a pharmacophore may comprise a greater 
number of groups. For example, it is possible that the 
5 entire 6 amino acid sequence, -Xf, may be needed for a member 
of the preferred CX^C library to bind to complex targets, in 
which case the pharmacophore includes the entire binder. 

Considering by way of example, the case of binders 
isolated from the preferred library, of sequence CX^C, the 

10 chemical groups defining a peptide pharmacophore are terminal 
groups on amino acid side chains. Typically, therefore, a 
sequence of two to four contiguous amino acids will contain 
the pharmacophore of interest. For example, Fig. 11 
illustrates an Arginine -Glycine -Aspartate sequence forming a 

15 well known platelet aggreg^ation_inhibiting pharmacophoTeT 
' - which^ iF^ef ined by the positions and orientations of the 
adjacent -CNjH^, -CcrH^-, and -COOH groups. Pharmacophores 
formed by discontiguous amino acids are not likely to occur 
in the preferred library due to the conformational constraint 

20 on the short peptide imposed by the disulfide bridge. 

The selection step determines candidate amino acid 
sequences in each binder that define a candidate 
pharmacophore by the positions of their .terminal groups. 
Candidate selection depends substantially only on the 

25 chemical structures of the amino acid side chains and 
terminal groups (only very rarely on backbone groups) . 
Geometric structure is not yet available and cannot be used 
for candidate selection. In the preferred embodiment, amino 
acids are grouped into homologous groups defined by group 

30 members having similar side chain structure and activity (see 
infra) . Candidate pharmacophores are found by searching the 
sequences of the N binders for short sequences of homologous 
amino acids. This search will produce at least one 
candidate, because all the binders share the actual 

35 pharmacophore. Several candidates will usually be found 
since geometric information is ignored, and the search is 
thereby underdetermined. 
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Fig. 2A illustrates an exemplary method of performing 
the search for homologous sequences. Although this method is 
illustrated as searching for homologous contiguous sequences 
of length 3, it is easily adaptable to search for homologies 
5 of other lengths and also for discontiguous homologous 

sequences. If no candidate pharmacophores of length 3 have a 
consistent consensus structure, then pharmacophores of length 
2, 4, or longer or discontiguous sequences must be searched 
and selected for test. For some complex targets, the 
10 pharmacophore may include the entire variable part of the 

library member. The exemplary method is a simple depth- first 
search for matching amino acid strings. More sophisticated 
string search methods are known and are equally applicable to 
this invention. 

15 _ _ Th^ methad_begins vi.th^the^ administrative ^steps^2Gl-and— 
202 of labeling the binders with integers from 1 to N and 
assigning the string variable 'ABC to the next left most 
sequence of three amino acids to test in binder 1. If this 
is the first candidate selection, 'ABC will be at the left 

20 most position in binder 1. If prior candidates have been 
selected, 'ABC will be assigned one amino acid to the right 
of its prior assignment. The FOR loop, formed by steps 203, 
206, and 207, then selects each binder from 2 to N for 
scanning for a sequence homologous to 'ABC • Step 203 does 

25 loop administration. Step 206 does the scanning. If 

homologous sequences are found, test 207 loops back to scan 
the next binder. If homologous sequences have been found in 
all binders from 2 to N, the loop exits at step 204, In this 
case 'ABC is a string in binder 1 which is homologous to 

30 other strings in all remaining binders and is thus a 

candidate pharmacophore. The method exits at 205 for this 
candidate to be structured and tested for whether it is the 
actual pharmacophore. If a binder does not have a sequence 
homologous to 'ABC, then this string is not a candidate. In 

35 this case, test 208 determines if 'ABC is at the right end 
of binder 1. If so, there are no more, homologies to test for 
and the method exits at 209. If not, then 'ABC is advanced 
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one amino acid to the right 210 and the scan of all binders 
is repeated beginning at 203. 

Fig. 2B illustrates how string variable 'ABC is scanned 
across binder 1, represented schematically by 220. First, 
5 'ABC is assigned to X^XjXj at 221, then to X^XyX^ at 222, to 
XjX^Xs at 223, and finally to X^X^Xg at 224, 

Given an assignment to 'ABC, step 206 scans each other 
binder, for example binder K with K>1, for homologous 
sequences. This is simply done by comparing all contiguous 

10 substrings of binder K with 'ABC to determine if they are 
homologous. They are homologous if corresponding amino acids 
in the substring and 'ABC are homologous. In turn, two 
amino acids are homologous if they satisfy established 
homology rules . Each homologous sequence found in binder K 

15 defj.nes_a .separate^candidate-pharmacophore^r if^ sequences 
homologous to 'ABC are found in all other binders. 

In a case where discontiguous homologous secjuences are 
sought, 'ABC is assigned to amino acids in discontiguous 
positions in binder 1 and then compared for homologies to 

20 amino acids in the same relative positions throughout the 
other binders. 

Various rules of amino acid homology may be used in this 
invention. In the preferred embodiment, amino acids are 
homologous if they are found in the same class of amino 

25 acids, based on side chain activity (see Lehninger, 

Principl es of Biochemistry . (1982), chap. 5). Preferred 
homologous groups of amino acids are as follows. The 
nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine, tryptophan and 

30 methionine. The polar neutral amino acids include glycine, 
serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids 
include' arginine, lysine and histidine. The negatively 
charged (acidic) amino acids include aspartic acid and 

35 glutamic acid. The foregoing classes may be modified by 
those skilled in chemical arts to create finer 
classifications. For example, phenylalanine and tryptophan 
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could be placed in a separate aromatic nonpolar group . 
Further, homology rules could depend on amino acid sequence, 
such as by dividing contiguous doublets or triplets of amino 
acids into homology groups. 
5 The invention is not limited to the above -described 

exemplary method of selecting candidate pharmacophores. Any 
automatic method of selecting candidates that depends only on 
chemical structure of binder library members, preferably 
expressed in terms of building block composition and 
10 sequence, can be used. For example, in the case of the 
preferred CX^C library, candidates could be selected by a 
clustering analysis performed on the entire amino acid string 
in a multi -dimensional space. 

This above method of selecting candidate pharmacophores 
15 is not limited to the preferred CX,C diversity library. For 
example, this method is immediately applicable to any 
diversity library having members comprising building blocks 
linked by a linear backbone by simply specifying rules of 
homology appropriate for the building blocks. These homology 
20 rules would group building blocks presenting similar 
structure and reactivity to targets. This method then 
selects candidates comprising sequences of homologous 
building blocks present on all the binding library members. 
If the library members do not have a linear backbone, a 
25 related candidate selection method can be used. In this 

case, the search for homologous building blocks would need to 
be confined to adjacent building blocks. Adjacent building 
blocks in this case are those building blocks brought 
physically close by whatever chemical structures form the 
30 library members (instead of simply being linearly adjacent on 
a backbone) . An adjacency determination would be specific to 
the particular chemical structure and would be algorithmicly 
specified. In addition appropriate rules of homology would 
be specified. The method would then select candidates 
35 comprising groups of adjacent, homologous building blocks, a 
group being present on each binding library member . 
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The above-described step is the selection step of the 
overall select and test method. Distance measurements and 
Monte Carlo structuring, steps 4 and 5, determine a consensus 
pharmacophore structure for the candidate, if possible. If a 
5 consensus is found, the candidate is the actual 

pharmacophore. If a consensus is not found, this selection 
step must be revisited, and a new candidate selected for 
test . 

10 5.6. INTRAMOLECULAR DISTANCE MEASUREMENTS 

Having obtained N binders, their chemical building block 
structures (chemical formula or primary sequence) , and the 
identification of a candidace pharmacophore in each binder, 
steps 4 and 5 of the method of this invent ion cogp^eratively 
^15 determine^ a precise spatial structure for the candidate 
pharmacophore (if it exists; if not, a new candidate 
pharmacophore is selected.) In the preferred (but not 
limiting) embodiment of this invention, N members of the CX^C 
library that specifically bind to the protein target of 

2 0 interest have been screened; their sequences determined; and 

a candidate pharmacophore consisting of homologous triplets 
(more generally from 2 to 6 mers) of amino acids has been 
determined in each binder. 

Step 4 measures one or more strategic distances, 
25 preferably no more than 10-20, e.g., 1-10 or, more 

preferably, 1-5 interatomic distances are measured. The 
remainder of the structure is determined in subsequent steps, 
other than by direct measurement. The interatomic distances 
measured in step 4 are preferably with an accuracy of at 

3 0 least 2 A, more preferably at least 1 A or 0.5 A or 0.25 A, 

and most preferably at least 0.05 A. Thus, in a preferred 
but not limiting embodiment, distances in the pharmacophore 
are specified to at least approximately 0.25 A. Step 5, 
using the CCMBC computational method, then completes 
35 determination of the pharmacophore structure at a high 
resolution and the structures of the rest of the binder 
molecules with a secondary resolution. Having a high 
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resolution structure for the pharmacophore of interest is 
orders of magnitude more useful than having a low resolution 
structure for an entire binder. Consequently, steps 4 and 5 
focus resources on the former problem. 
5 A distance measurement method is preferred for use if it 

meets certain conditions, as follows. First, accuracy of 
distance measurements is preferably better than at least 0.25 
A for distances on the order of those between amino acids in 
a peptide. Second, measurement conditions preferably 

10 approximate target binding conditions, i.e., are 

approximately physiologic. For example, crystallization, 
which may induce conformational changes, is preferably 
avoided. A.lso, the employed measurement methods preferably 
allow one binder sample to be measured when dry, when 

15 hydifated and when bound to the target molecule of interest, 
thereby observing the effects of water and conformational 
changes on binding. Third, the measurement method is 
preferably quick and inexpensive. 

Important advantages are conveyed by these certain 

20 conditions. First, as the method of the invention determines 
high resolution pharmacophore structures, use of distances 
less accurate than the intended results would almost 
certainly result in decreased resolution- Second, as the 
CCMBC structure determination method approximates the 

25 structural effects of hydration and target binding, use of 
accurate distances including the physical effects of 
hydration or binding helps increase the resolution of the 
computational results. These distances as used in the CCMBC 
method pull the binder structures towards a more accurate 

30 representation both of the bound, hydrated pharmacophore and 
also of the remainder of the binder molecule without a 
computationally burdensome inclusion of water molecules and 
without knowledge of the target molecule's structure. 
REDOR NMR is the preferred method of distance 

35 determination. REDOR is a solid phase NMR technique which 
directly measures the inter-nuclear dipole-dipole interaction 
strength between two spin M nuclear species, denoted where 
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A and B are the two nuclear species measured. The inter- 
nuclear distance between A and B is simply determined from D 
by the following equation: 



r (1) 



where R„ is the inter-nuclear distance, h is Planck's 
constant, and y^, and are the respective gyromagnetic 
10 ratios of nuclei A and B. REDOR is typically accurate to 
less than 0.05 A and can generally measure distances up to 
about 8 A. 

Any two nuclear species observable and resolvable by NMR 
methods and, preferably, adaptable to chemical inclusion in 

15 J^l diyersity libra.ry_me^^ interest, -may be the- basis = 

of REDOR measurements. Although the subsequent description 
is often directed to distance determinations between "C and 
"N nuclei in members of a preferred library comprising the 
sequence CXjC, this invention is not so limited. One skilled 

20 in the art can readily adapt the method for use in making 
measurements of other types of molecules (e.g., peptides and 
nonpeptides) ,- additionally, other nuclear species may be 
used. Other common spin M species that can be used include 
but are not limited to "P and the halogen "F. 

25 General references on NMR techniques are Slichter, 

Pr^ngjples of Magnetic ResonancA Berlin, Springer -Verlag, 
(1989) and Mehring, High Resolution iwp in Soljds . Berlin, 
Springer-Verlag (1983) . REDOR references include Gullion et 
RCtati-Pnal-echo double -rgsni-^^nce NMR . J. Magn. Res. 

30 81:196-200 (1989); Pan et al . , Determiners nn r.^ 

i nternvg]fflr diprf^nce bv rotational -Arh» ri o ubie-r^^r^r^p y,^^ ^ 

9t sol id.*?, J. Magn. Res. 90:330-40 (1990); Garbow et al . , 
CS^ rmiTiatiPn of the molecular confn,-n.; ^ tion of m>.lanQsrai;in 
tfg i Pq 13r, T5N-RED0R NMR SPectrnsropY j. Am. Chem. Soc. 
35 115:238-44 (1993). all of which are incorporated herein by 
reference. 
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Other solid phase NMR techniques are applicable but less 
preferred. These include but are not limited to those 
disclosed in Kolbert et al . , Measurement of internuclear 
distances bv switched angle spinning , J. Physical Chemistry 
5 98:7936 et seq. (1994), and in Raleigh et al . , Rotational 
Resonance NMR , Chemical Physics Letters 146:71 (1988). These 
techniques measure homonuclear distances only to 0.5 A 
accuracy and are less accurate than REDOR. Liquid phase NMR 
techniques of NOE (nuclear overhausser) and COESY 

10 (correlation enhanced spectroscopy) can also be used but are 
less preferred. They require complex interpretation to 
obtain comparable distance accuracy greater than 0,5 A in 
small molecules with complete rotational freedom. 

X-ray crystallography can also be used/ although it is 

15_much_less. preferred. ^ since crystal-lization^^m induce^ ^ 
conformational changes in the binder, and since binding to 
the target molecule may be necessary for crystallization. 

In the case of REDOR measurements of the heteronuclear 
distances between "C and "N, "C and ^^N are introduced 

20 ("labeled") at the .positions between which a distance 
measurement is needed. The preferred embodiment of the 
invention measures the NMR resonance. Since nearly all 
the "N signal will originate with nuclear labels, very little 
background signal due to natural abundance nuclei need be 

25 accounted for. Alternatively, the ^^C resonance may be 

measured, in which case the natural abundance background is 
subtracted from the measurements. 

Since REDOR depends on observing the internuclear 
dipole-dipole interaction, the binder being measured should 

30 be substantially stationary on the time scale of the NMR 
signal. The measurement system preferably ensures this 
condition. The substrate holding the binder to be measured 
can be 'chosen so as to restrain binder motion, or the 
measured sample may be cooled to restrain motion, or, 

35 alternatively, the binder may be bound to its target molecule 
in order to restrain its motion. 
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Further details of the REDOR distance measurements will 
make reference to Fig. 3. This illustrates the measurement 
method for one labeling of one binder, which is repeated if 
the binder requires multiple label ings and also is repeated 
5 for each binder. Subsequent description will focus on only 
one binder. 

Step 41 chooses a binder labeling. Labeling is 
preferably done to obtain the most information about the 
pharmacophore consistent with chemical labeling opportunities 
10 and available labeled amino acids. Backbone labeling, for 
example, labels the amide N of one amino acid and one of the 
backbone C's of a next adjacent or more distant amino acid. 
Backbone labeling is tj-pica-lly done in the backbone in the 
vicinity of the candidate pharmacophore. It might also be 
-15-done aWy^f rom a^andidate~pha^ to confirm a 

previously determined structure as described for step 6, 
Side chain labeling strategies vary with the chemical 
opportunities offered by the candidate pharmacophore . If a 
terminal N is available, an adjacent side chain or backbone C 
20 can be labeled. If not, the side chain C and backbone amino 
N can be labeled. Side chain labeling is preferably on side 
chains in the candidate pharmacophore. Preferred labeling in 
the candidate pharmacophore is either a backbone amino N and 
a nearby backbone C or a side chain C or, if available, a 
25 side chain amino N and an adjacent or nearby side chain C. 

In an alternative embodiment, to get the most structural 
information on the binders, these labelings are designed to 
select the actual major conformation from known possible 
conformations. For example, if it is known from preliminary 
30 determinations that a binder may exist in one of a few, e.g. 
two, major backbone or side chain folding patterns, the 
labelings are chosen to distinguish these conformations. 
Nuclear pairs labeled for measurement are preferably those 
that have significantly different distances in the possible 
35 conformations. 

Multiple labeling of one binder to determine multiple 
distances at once is possible, for example, by including one 
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"C and several nuclei, or vice versa, in one labeled 
molecule. Multiple labeling is limited, however, as is 
obvious to one skilled in the NMR arts, by chemical shifts of 
the various nuclear resonances. REDOR measurement of 
5 multiple "N-"C distances requires that each spectroscopically 
observed "N or "C resonance have a distinguishable chemical 
shift. If these conditions are not met, several separately 
labeled versions of the binder are prepared and measured, one 
for each internuclear distance sought. 

10 Step 42 synthesizes the labeled binder after a labeling 

has been determined by applying these preferences and rules. 
In an embodiment wherein the binder is a peptide, variously 
labeled or -^^N labeled amino acid reagents for the 
synthesis of the labeled binder are widely available from 

15 commercial sources. A preferred supplier is Isotec Inc. 
(Miamisburg, OH) • Other commercial sources include MSD 
Isotopes (Montreal, Canada) and Sigma Chemical Co. (St. 
Louis, MO) . Step 42 has three substeps : linear peptide 
synthesis 43, cyclization 44 (by forming the disulfide bond) , 

20 and deprotection of the side groups 45. Synthesis and side 
chain deprotection are performed by solid phase peptide 
synthesis using standard Boc (tert-butoxycarbonyl) and Fmoc 
( 9 - f luorenylmethyloxycarbonyl ) chemistry . Exemplary 
references for this method are Merrifield, J. Amer, Chem. 

25 Soc, vol 85, pp 214 9 et seq. (1963); Caprino et al , , J. 
Amer. Chem. Soc. (1970); and Stewart et .al., ^(?lXd Ph^?? 
Peptide Synthesis . Berlin, Springer-Verlag (1984) , which are 
herein incorporated by reference. Cyclization is by 
conventional mild oxidation, well known in the chemical arts. 

30 The method of these steps is detailed in Example 2 svpff^ ■ 
To obtain accurate REDOR NMR measurements, the binder 
sample is preferably highly purified. Accordingly, it is 
prefarable that the sample be at least 90% pure (but not 
necessary if spurious NMR signals can be discriminated) , and 

35 even more preferable that the sample be at least 95* pure. 
Such pure samples can be obtained as follows. In a first 
synthesis method, the binder peptide is synthesized directly 
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on the substrate to be used in the subsequent NMR 
measurements. In this case particular care is preferably 
taken with the standard solid phase synthesis steps of 
Example 2. By way of example, synthesis reagents should be 
5 pure, adequate time should be allowed for diffusion of 
reagents and solvents throughout the interstices of the 
substrate resin, and between steps, prior reagents should be 
thoroughly washed from the resin before new reagents applied. 
That the purity, reaction time, and washings are adequate is 
10 gauged by subsequent analysis. An aliquot of the resulting 
peptide -resin is taken, the peptide is cleaved (Example 2) 
and its purity analyzed by mass spectroscopy or high 
performance liquid chromatography (HPLC) . 

_ J second synthesis method-, -the-pepticie^ can^ be — ^ 

15 synthesized on any convenient solid phase substrate in a 
standard manner and then cleaved from the substrate. The 
peptide is purified by standard methods (e.g., HPLC) and then 
attached to the NMR measurement substrate. The attachment 
can be done by any methods known in the art, preferably at 
20 either the amino- or carboxy- terminus, e.g., by condensation 
of the free carboxy terminal group on the peptide with an 
amino labeled resin, with the attachment step preceding 
deprotection of any side chain carboxy groups on the peptide; 
by use of heterof unctional linker groups, etc. 
25 Great care is preferably exercised in forming the 

binder-substrate used for the REDOR NMR measurements. This 
invention is also directed to binder-substrates suitable to 
precise REDOR NMR measurements in the following environmental 
conditions: dry unbound, hydrated unbound, and bound to its 
30 molecular target molecule (e.g., in lyophilized or hydrated 
forms) . 

For any binder and any NMR measurement substrate 
utilized, the substrate should restrain the attached binder 
sufficiently so that binder motion will not average out the 
35 dipole-dipole interactions necessary for the REDOR 

measurement. Generally, this requires that the frequency of 
motion of the binder be less than the frequency of the 
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dipole-dipole interaction being observed, which varies with 
the nuclear species being observed and the measurement 
distance. For »C-"N observations to 2 . 5 A the binder motion 
frequency should be less than approximately 200 Hz; for 
5 observations to 5 A. less than approximately 30-5P Hz; and 
for observations beyond 5 A. less than approximately down to 
10 Hz. The more polar the substrate, such as glass beads or 
p-MethylBenzhydrilamine ["mBHA") resin, the more are polar 
attached binders (such as are many peptides) restrained. 
10 Less polar substrates, such as polystyrene resin, provide 
less restraints for polar binders. In an embodiment wherein 
a peptide comprising the sequence CXeC is bound to an mBHA 
resin with an glycine residue serving as a linker to a 
binding site on the resin^ probably no additional^ step^s need ^ 
^5 be taken-for 2V5-A-Waslirements^.^^^^^^ steps that can 

be used, if needed, to slow binder motions include cooling 
the measurement sample to. for example, liquid temperatures 
(approximately 77 or) or binding to a large, relatively 
immobile target molecule. 
20 Second, the net binder density is important and 

typically is adjusted. The substrate preferably has an 
adjustable number of binder synthesis sites or binding sites 
per unit of substrate surface area. Too high a binder 
density on the substrate surface will cause inter-molecular 
25 nuclear dipole-dipole interactions to distort the REDOR 
distance measurements. To obtain accurate intra-molecular 
distances, the peptides should be kept sufficiently far apart 
so that only intra-molecular nuclear dipole-dipole 
interactions are significant. Inter-molecular nuclear 
30 dipole-dipole interactions are preferably kept less than 

about 10% of the intra-molecular interaction, in the case of 
"C-"N measurements, this criterium can be monitored by 
observing »C-"C dipolar couplings. As the dipole interaction 
falls off as R-\ keeping adjacent binders apart by more than 
35 approximately 2-3 times the distance to be measured is 

sufficient. For measurements to 5 A. this criterion can be 
satisfied by keeping binders approximately lo A or more 
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apart. At a 10 A spacing interfering "C or "n signals will 
not exceed 2.8 hz, which is sufficient attenuation for 30 hz 
or greater measurements. 

In an embodiment wherein the binder is a peptide 
5 comprising the sequence CX.C, that is synthesized on an ™bha 
res.n that is also to serve as the NMR substrate,' there is an 
additional upper bound on the peptide density. To prevent 
disulfide dimer formation in more than approximately 5* of 
peptides, the peptides are preferably kept apart by at least 

scaffolds result in unconstrained, flexible peptides of 
altered structure distorting the REDOR distance d..-rminatio„ 
o- the properly conf ormationally constrained, cyclized binder 
, "'P.'^'^^t- _ -operation wiH meet tSis" 

as requirement, m this case, more than S5V of the disulfide 
bonds will result in intended intra -molecular constraints 
This separation may be adjusted based on a determination of 
actual dlmer formation by chromatographic (e.g., hplC) or 
mass spectroscopic analysis of the peptide after cleavage 
20 from the substrate (see Section 6.6, infra) 

»m instrumental sensitivity places a lower bound on 
binder density. By way of example, for an adequate observed 
signal to noise ratio using a preferred NMR spectrometer, no 
25 orLenT" "^^^^'^^^^ "''""-'^ nuclear spins should be 
b r /" '^"^ '""Slates to having a 

»ole" -density of „o less than approximately 0.017 mmole/g (x 
■mou .10 mole. . For alternative NMR spectrometers witt 
higher field magnets ^rmor frequency of SCO ^ 
binder density may be as low as 0.0017 mmole/g 
30 A third substrate condition to be considered is pore 

s«e, which is relevant when measurement of binder bound to a 
target molecule is desired. l„ a preferred method of 
s"mt:"t T ">e substrate must have 

35 To fn b / ''^^ ^Ucnl.. can diffuse 

them I °" "ind to 

SO kd are typically roughly spherical with diameters of 
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approximately 50 A. Preferable substrate pore sizes for use 
with such moderate sized protein targets are no less than 
100-200 A. Excessive pore sizes can result in a too dilute 
binder that decreases NMR signal intensity. The preferable 
5 pore sizes also facilitate high purity peptide synthesis 
directly onto substrate resins by similarly facilitating 
diffusion of reagents and solvents to synthesis sites. Also, 
binder substrate binding is preferably of such a nature that 
it will not be disrupted under either dry conditions, aqueous 
10 conditions, and conditions suitable to binder- target binding. 
Generally, adequate pore sizes are in the range of 100-500 A, 
although this will vary with the size of the target molecule. 

Solid phase substrates that can be used include but are 
not limited to mBHA resins, divinylbenzyl polys_tyren_e j:e_sins.,_ 
15 and-gi=ass -beads". "All^f these ^substairces can be manufactured 
to have binding sites in the range from 0 to 1.0 mmol/g. m 
addition, these substrates can be made so as to have the 
following surface areas: for mBHA about 100 m'/g, for 
polystyrene from 50-100 mVg, and for glass from O.l-lOO mVg. 
20 These substrates also can be manufactured so as to have a 
surface binding site density in the range of from 0 to i.o 
mmol/m^ More generally any microporous material with a 
surface density of binding sites adjustable from 0 to at 
least 1.0 mmol/m=, and preferably with pore sizes in the 
25 preferred ranges, can be used. Suppliers of such adjustable 
resins include Chiron Mimotope Peptide Systems (San Diego, 
CA) and Nova Biochem (San Diego, CA) . 

Peptide binders can be synthesized directly on the 
surface of the substrates, by way of example as set forth in 
30 Section 6.6 infra, to achieve a purity of preferably at least 
90%, more preferably at least 95%. In the case of a peptide 
comprising the sequence CX,C, the preferred peptide spacing on 
the substrate is no closer than approximately 10 A, or a 
peptide density of no greater than one peptide every lOO 
35 Peptide synthesis on the preferred resin 

p-MethylBenzhydrilamine ("mBHA"] with 0.16 mmole/g of peptide 
binding sites, a surface of 100 m'/g, and a preferable pore 
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size of 100-200 A results in a binder-substrate havina such - 
preferable peptide surface density and suitable for accurate' 
REDOR NMR measurements in dry, hydrated. and bound 
conditions. The total binder density is more than tenfold 
5 above instrumental sensitivity. The glycine linker provides 
a sufficient spacer from the substrate surface. * 

Steps 43. 44, and 45 in the preferred embodiment of the 
invention are carried out by one of a number of commercial 
peptide synthesis sources, such as Chiron Mimotope Peptide 
10 Systems (San Diego, CA) and Nova BioChem (San Diego CA) 

Methods that can be used in these steps are known in the 'art 
However, the preferred practice of these steps is detailed ir 
the example in Section 6.6. 

^ ^ The invention thus provides a method of ^ performing solid 
15 state NMR. preferably REDOR NMR, measurements of molecules on 
a solid phase substrate. In one embodiment, the molecule is 
a compound having conformational degrees of freedom at the 
temperature of interest that are limited to torsional 
rotations about bonds between otherwise rigid subunits the 
20 torsional rotations respecting any conformational 

constraints. The molecule is preferably a peptide, more 
preferably a peptide of constrained conformation, and is most 
preferably a peptide having one or more cystines (e g 
comprising the sequence CX,C) . m other embodiments, the 
25 molecule is a peptide analog or derivative, m a preferred 
embodiment, the substrate is a solid phase on which the 
molecule (e.g., peptide) has been synthesized, with a high 
degree of purity, m specific embodiments, the REDOR 
measurements of the molecule on the substrate can be done in 
a dry nitrogen atmosphere, under hydrated conditions and 
when the molecule is either free or bound to a target The 
invention is also directed to a solid phase substrate "having 
a surface to which is attached a population of molecules 
(preferably peptides, peptide derivatives, or peptide 
35 analogs} , suitable for obtaining REDOR NMR measurements of 
the molecules. In specific embodiments, at least 90% r.. 
population consists of a single molecule (i.e.. 90V purity) 
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In a more preferred aspect, 95% purity is present. Methods 
of producing such solid phase substrates, as described above, 
are also provided. 

Step 46 REDOR spectroscopy is performed on the 
5 strategically labeled, binder peptide-resin sample. Step 46 
details include final sample preparation, spectrometer 
parameters and tuning, and excitation pulse sequence. Sample 
preparation can be carried out by standard methods. The 
binder peptide -substrate sample is dried in Nj, and an 
10 approximately O.l g amount is sealed in the NMR measurement 
rotor. The rotor can be cooled, if necessary, to limit 
binder motion. 

An alternative final sample preparation step is to bind 
the targe^ molecule to ^the binder peptide-resin^ sample -and- - - 

15 then dry the complex in Nj. Optionally, the binder peptide 
can be split from the resin before binding to the target. In 
this alternative, the highly accurate REDOR NMR distances are 
of the bound binder and thus reflect any conformational 
changes that occur upon binding with the target. 

20 A triple resonance, magic angle spinning I "MAS"] NMR 

machine is adaptable to REDOR measurements. Such machines 
are commercially available from Bruker (Billerica, MA), 
Chemmagnetics (Fort Collins, CO), and Varian (Palo Alto, CA) . 
An exemplary machine suitable for use is in the laboratory of 

25 Prof. Zax, Cornell University (Ithaca, NY). This machine 

includes a 7.05 Telsa magnet from Oxford Instruments (Oxford, 
United Kingdom) and RF pulse excitation and receiving 
hardware conventional in the NMR art. An exemplary 
measurement rotor is a triple resonance, MAS probe from 

30 Chemmagnetics. 

The exemplary magnetic field is adjusted for a Larmor 
frequency of 300 Mhz with, corresponding Larmor frequencies 
for "C and "N of 75.4 and 30.4 Mhz, respectively. An 
exemplary probe spin frequency (u,) is 4.8 kHz, with 

35 corresponding rotor period (T,) of 0.208 msec. »N resonances 
are measured. The low natural abundance of "n eliminates the 
need for natural background corrections. Alternatively, "c 
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measurements can be done with conventional background 
corrections . 

REDOR is a pulse NMR technique requiring careful 
excitation of appropriate »H, "c, and "N resonances 
5 synchronous with the MAS rotor and followed by observation of 
the "N free induction decay. Many alternative REDOR 
excitation sequences have been described in the literature, 
some of which are found in the references cited hereinabove. 
These sequences can involve multiple »C excitations per rotor 
10 period. The simple pulse sequence preferred for use in this 
invention requires only one »C excitation per period. 

The exemplary sequence for 8 rotor periods is 
illustrated in Fig. 4. and is detailed herein in a manner 
such that those skilled in the NMR arts can program an NMR 
15^ spectrometer^ fpr^similar -measurementr 

are the 'H channel 50, the ^^C channel 51, and »N channel 52 
The »C and »N RF power supplies are tuned to the resonances 
of the nuclei whose distance is to be measured. The »H 
channel RF power is initially tuned to the resonance of a 
20 proton coupled to the »N of interest. The time sequence 

(increasing to the right) of the exciting signals (increasing 
vertically) in each of these channels is illustrated. 

In the »N channel, an initial excitation is applied to 
the .^N spins in either of two manners: either an initial 7r/2 
25 pulse may be applied or, as illustrated and preferred, a 
cross polarization transfer from the protons is made.' 
Sufficient RF intensity is applied at time 54 in both the 
and »N channels, SO and 51 respectively, to achieve a 
Hartman-Hahn precession match at a tt spin flip time of 13 2 
30 Msec, subsequent to the initial "N excitation, synchronous ,r 
pulses 56 are applied in phase with the MAS probe rotor for N 
rotor cycles, denoted by line 59.. with sufficient RF 
intensity to achieve a v spin flip time of 13.2 Msec. The 
phase of these v pulses is varied systematically to reduce 
35 artifacts in a manner well known in the NMR arts. The 
preferred sequencing is detailed in Table i. 



- 59 - 



wo 96/30849 



PCTAJS96/04229 



Table 1 



5 



IT Puloe Ph) 




Sequencing 


Number of rotor cycles 
between excitation and 
observation 


Phase sequence 
(in processing frame) 


2 


YY 


4 


XYXY 


8 


XYXYYXYX 



10 

The phase sequence is expressed as the axis, in the frame 
processing with the spins, about which the tt spin flip is 
made. This axis is systematicaiiy varied depending on the 
number of rotor periods intervening between the excJ.tation 

iS^and^ signal bbs The illustrated phase sequences may 

be varied into equivalent sequences in a conventional manner. 
For example, "XYXY" is equivalent to "-YX-YX". Finally, at 
501 the free induction decay of the spins is observed and 
generates the time domain output signal. 

20 Iri the channel, the preferred sequence is an initial 

exciting 7r/2 pulse 53 followed with the previously described 
cross polarization transfer 54 to the "N spins. The less 
preferred sequence omits these initial pulses in favor of a 
Tt/2 "N excitation. During the subsequent spin evolution time 

25 for rotor cycles and the free induction decay time 501, a 
decoupling field 55 is applied to the protons. The preferred 
decoupling field has a 66 kHz RF intensity to achieve a *H tt 
spin flip in 7,6 /zsec. 

In the channel, two distinct options must be 

30 measured. The first option (not illustrated) has no "C 
exciting pulses. The second option (illustrated) has 
synchronous tt pulses 57 applied for rotor cycles at the 
rotor frequency but with a fixed phase delay 58, denoted by 
tj, and at sufficient signal intensity sufficient to achieve a 

35 TT spin flip time of 10.6 ^sec. Any value of t^ may be used; 
the preferred value is 1/2 the rotor period, T^/l. 
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Alternative REDOR pulse sequences include 2 or more "C pulses 
per rotor cycle. 

Summarizing still with reference to Fig. 4, a REDOR 
measurement scan is characterized by the number of rotor 
5 cycles, Nc, of spin evolution. A complete scan comprises, 
first, an equilibration period, preceding the illustrated 
pulse sequences. Second, there is a "N excitation period 
comprising pulses 53 and 54. Third, there is a spin 
evolution period for N, rotor cycles which has two options, 
10 both measured. Both options comprise the application of 
decoupling *H field 55 and synchronous in phase "N v pulses 
56. The first option has no "C excitation; the second has 
synchronous phase displaced "C tt -pu-l-sss 57. Fourth, and 
finally, there is observation of free induction decay 501 of 
15 -the *>N-spins . - Figv 4^ iTlust^aTes an of^B 7 Each^scan ^ 
option is repeated, and the induction decay signal 
accumulated, for a sufficient number of times to obtain 
acceptable signal to noise ratio. With the preferred 
practice, this has required less than approximately 5,000 
20 scans, and typically 3000 have been sufficient. 

An alternative implementation of the REDOR measurement 
interchanges the roles of "C and "N and measures the free 
induction decay of "C. Further, the invention is not limited 
to this described pulse sequence and is adaptable to 
25 equivalent pulse sequences yielding direct inter-nuclear 
dipole-dipole interaction strengths. 

Following REDOR measurement step 46, is data analysis 
step 47. This comprises several substeps. As is 
conventional, the free induction decay signal is Fourier 
30 transformed from the time domain to the . frequency domain. 
The scan option without the "C excitation produces a 
transformed signal with an observed "N resonance peak of 
magnitude S; the scan option with "C excitation produces an 
observed "N resonance peak of magnitude S,. The REDOR output 
35 signal, denoted AS/S, is conventionally formed according to 
the equation: 
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AS ^ ~ ^f^ (2) 
S S 

The output signal is observed for different K,. Preferably 0, 
^2, 4, and 8 rotor cycles are observed. Other preferred Nj 
will be apparent during the following description. 

Further analysis of the REDOR output signal, AS/S, is 
made clearer by a veiry brief explanation of how this output 
signal represents the spin 1/2 dipole-dipole interaction 
^° between the "C and "N. In the spin evolution period, the 
decoupling excitation eliminates all proton effects from the 
"C and NMR spectra. Magic angle spinning, in the scan 
option without any "C excitation, eliminates all nuclear 
.dipole,-dipole^and=chemical- shift -anisotropy- from-the-NMR= - -= 
line. Thus signal S represents an NMR resonance without any 
dipole interaction. However, in the second scan option, the 
"C V spin flip pulses reintroduce in a controlled manner the 
dipole-dipole interaction. This interaction causes 
additional dephasing, or loss of signal strength, in the 
observed "N signal.' Thus signal S, represents an NMR 
resonance with dipole interaction and the output signal AS/S 
represents the percentage strength of pure dipole-dipole 
interaction between the "C and "N nuclei. The exact loss of. 
signal strength depends on the timing of the "C pulses and 
the number of rotor cycles for which they are applied. 

In the alternative where a general phase delay, tj, is 
used, the expression for the REDOR signal is derived by 
numerically integrating the following equations from the Pan 
et al. reference (1990, J. Magnetic Resonance 90:330-340): 

30 



■2 2« 



S, = 1— Lf fcos[r^o)i(o,p.ti))sinpd|Jda 



3^ where 
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Wa(o.p,t) = ±-|i?c.wIsin2(P)cos2 (a*u>,t) - v/2sin2pcos (o *w^c] 
wi(o.p,t,) = -i- [/(u^Ca.p, t')dt' - fw^(o,p,t')dt'] 



This integration can be done by standard numer:cal 
integration techniques such as are found in Press et al . , 
Numerical recipes; the art of sclent jfSr comr^uti^ r j 
Cambridge, U.K., Cambridge University Press, (1986), chapter 
4, which is herein incorporated by reference. Alternatively 
the expression can be directly evaluated from the symbolic 
representations by numerical- tools such-as Mathematica from 
15 Wolfram Research Inc. (Champaign, IL) or Mathcad from 

Mathsoft Inc. (Cambridge, MA). In a preferred embodiment, 
however, a much simpler approach is used. 

In the preferred embodiment, the "C pulse phase delay is 
1/2 the rotor period, T„ and the preceding equations can be 
20 simply expressed (Mueller et al . , 1995, J. Magnetic 
Resonance, in press) : 

*.i 16i:'-l (5) 
25 A = N,T^„ 



where is a Bessel function of the first kind. Adequate 
accuracy is obtained by limiting the summation of equation 5 

3Q to its first five terms. Fig. 5 is a graph of this equation. 
Vertical axis 61 represents AS/S; horizontal axis 62 
represents X; and graph 63 represents equation 5. 

In detail, step 47 of Fig. 3 uses equation 5 and the 
REDOR output signal, AS/S. for various values of N, to obtain 

35 a best value for Do,, the dipole interaction strength. The 
internuclear distance is simply and directly determined from 
Do, by equation 1. An exemplary method for finding the best 
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value of DcN is to use a least squares method. First, form 
the sum of the squares of the differences of the observed 
AS/S and AS/S computed from equation 5, which will be a 
function of Dc^i T,, and through X. Second, find the value 
5 Dq) minimizing this function by searching exhaustively in 
sufficiently small increments over the relevant range. For 
example, Dq, can be varied by varying R in 0.01 A increments 
from 0.5 to 8 A. More efficient minimization methods as 
presented in Press et al. chapter 10 can also be used. 

10 Values of the Bessel functions can be simply calculated by 
the methods in Press et al, supra, § 6.4. Alternatively, 
this minimization and best value determination is easily 
performed directly from the symbolic representations with the 
previously cit^d mathenStricar packages . 

15 The example in Section 6,6 provides typical results of 

this measurement and analysis method. 

This completes the method of Fig. 3 and determines the 
internuclear distance between the "C and ^^N nuclei to which 
the excitation channels were tuned for the REDOR NMR 

20 measurements. If other C-N pair distances are to be 

determined in the labeled binder, step 46 as detailed above 
is repeated for the other distinct resonances. If the 
alternative ^^N resonances cannot be distinguished, separately 
labeled binders are prepared and measured. 

25 

5.7. CONSENSUS. CONFIGURATIONAL BIAS MONTE CARLO 
Broad overview 

With reference to Fig. 1, having foxind N specifically 
binding members of one or more libraries, step 2, selected a 

30 candidate pharmacophore shared by all these binders, step 3, 
and determined a few strategic distances in the vicinity of 
the candidate pharmacophore, step 4, precise pharmacophore 
and binder peptide structures are now determined by the 
preferred method, the consensus, conf igurational bias Monte 

35 Carlo method. Other orderings and identities of these steps 
are possible. For example, the binders may be pr determined 
thereby rendering step 2 unnecessary. Further, no strategic 
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distance measurements may need to be made, and step 4 may be 
omitted. Alternatively, a partial structure determination 
step may be inserted before step 4 to guide selection of 
distances for measurement. 
5 Pharmacophore structure determination of this invention 

is not limited to the CCBMC method to be describe'd. CCMBC 
makes the most efficient use of heuristic consensus binding 
and partial distance measurement information. However, the 
consensus pharmacophore can be determined by methods 

10 including but not limited to use of exhaustive REDOR NMR 

measurements or by extensive but fewer REDOR measurements in 
conjunction with a conventional molecular structure 
determination method, such as molecular dynamics, 
conventional Monte Carlo, or even peptide folding.^ rules. = - - ^ 

15 _ Jn .the- f ollowing de^cripti the CCBMC method is 

broadly overviewed; subsequently, details of important steps 
are described; and finally a description of the preferred 
computer method and apparatus for practicing the invention is 
given. From the description of the methods, equations, data 

20 structures, and programs provided herein, one will be able 
readily to translate them into implementations. 

Although the following descriptions are directed to 
binders isolated from the preferred library of peptides 
comprising the sequence CX^C (constrained by disulfide bonds) , 

25 the method is applicable to more general organic diversity 
library members. It is immediately applicable to compounds 
from constrained peptide libraries with other scaffolds and 
also to compounds from similar peptoid libraries. It will be 
readily apparent that the method is applicable to any 

30 compounds whose structural region of interest exhibits 
conformational degrees of freedom at a temperature of 
interest {e.g., body temperature 37*>C) that are limited to 
torsional rotations of rigid molecular subunits about bonds 
between the subxmits, in which any loops present in the 

35 structural region of interest are independently rotatable by 
concerted rotation (see Section 7. J^pendix: Concerted 
Rotation) . Examples of such coit^jounds include but are not 
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limited to peptides, peptoids, peptide derivatives, peptide 
analogs, etc., including members of libraries discussed in 
Section 5.2, supra. 

General features of Monte Carlo simulation methods are 
5 known. A reference is Rowley, Statistiral mechanics for 

thermophvsical property c alculations . Englewood Cliffs. N. J. , 
PTR Prentice Hall (1994), especially chapters 5 and 7, which 
is herein incorporated by reference. The application of 
simple Monte Carlo to constrained peptides has conventionally 
10 been hindered by difficulty generating geometrically proper 
and energetically useful conformational alterations, and by 
the consequent wasteful and inefficient exploration of 
conformational space. This method overcome E these nrohr *»inc 
for constrained peptides with a novel combination of 
, , 15 techniques., -In-^addifrionT tliis-method Xs^uniquTly able to 

incorporate partial information about binding affinities and 
distance measurements to improve determination of the 
pharmacophore structure, one goal of the invention. 

Fig. 8 is a overview of the method. Step 91 represents 
20 the initial geometric and chemical structure of each binding 
peptide in computer memory. Peptide geometric structure is 
represented as a set of records, each record representing one 
rigid subunit or one atom of the peptide. The subunit 
records are linked together as the subunits are linked in the 
25 peptide molecule. Each rigid unit record includes fields for 
the composition, structure, and connectivity of the rigid 
unit represented. Since the rigid units only undergo 
torsional rotations about mutual bonds, their internal 
geometric structure is fixed. 
30 If a previous run with these peptides has been done, 

peptide initial structure may be chosen as one of the 
structures generated late in that run. Such an initial 
structure is desirable since the effects of arbitrary initial 
conditions have been eliminated. Alternatively, an initial 
35 structure is generated from a prototypical backbone without 
side chains by adding sidechains with random torsional 
orientations. For members of each type of diversity library, 
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a prototypical backbone meeting structural constraints and 
representing an allowed configuration for a member possessinc 

the'cx pT k" P-^o^yPical backbone fol 

the CX,C library is generated from the CCBMC model itself as 
run for the linear peptide C(gly,.c (SEQ ID NO: 7) using a 
Hamiltonian consisting only on the term. The term 
contains only terms which, in the disulfide bond backbone 

blfwh"';""'"""' ^-O^e A and 

10 for a 1 ' ' ''"^"^^^ ^-"^ A- run 

10 for a linear peptide, no Type II backbone moves are made 

selected portions of the backbone are used to generate 

backbone alterations. The model is run with temn.,.=^ 
rri-arf., = n j wicH tcmperatures 

gradually decreasing from roon, te»,perature to a small 

, 5=^P-5Jt"re_approxi„ate-ly-l- - thV irnVri-o-w'ie'^pe'rature" ^ " 
r riL" P-"^yP"cal backbone. BaCbon s 

In memory, for each peptide, a current structure is 
20 represented,, the initial current structures being the Just 
assigned initial structures. Also in memory is ^presented a 
proposed modified structure for one peptide At stp ,3 the 
processor generates ..moves" that transform the current 

Thecal '"^ ""'^ "'"^^ temperature ,3, -c, 

thermal agxtation experienced by the binders so that their 
equilibrium structure may be deter,„ined. 

Generation of these moves for confonnationally 

30 There"are"1 °* "-""od. 

ju mere are two move tvno<3 'THrr^-=. t * 

"iuve types. Type I moves alte-^ the 

a°cToTthe°" "'d'": °' ' "■^"ly'cho.en amino 

by sile 1^ ""^°^'^ P'P^Oe- The alteration is built 

by side Cham removal followed by side chain regrowth into a 

as ZllT'V -"^^"^ regrowth. un'favIra^L 

Zll V ^i"- ""-ins is. avoided, l^pe I 

ZTl \ r °' - "-"ited random re^on of 

the peptide backbone of a randomly chosen binder by 
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performing linked, or "concerted", rotations, the linking 
being such that only four backbone rigid units are spatially 
displaced. Thereby the internally bonded ring of 8 amino 
acids will not be disrupted. A reference describing a 
5 similar move in linear alkane molecules is Dodd et al . , A 
concerted rotation algor ithm for atomistic Monte Carlo 
simulation of polvmer melts and alassefi. Molecular Phys., vol 
76, pp 961 et seq. (1991), which is herein incorporated by 
reference. The ratio between the Types I and II moves is an 
10 adjustable parameter with a preferred value of 4. 

Another important aspect of this method is that both 
moves are selected in a "conf igurationally biased" manner. 
Normal Monte Carlo methods use standard Metropolis 
procedures, in which each proposed structure is generated _ _ . 
. ^ 15 randomly and independeh structure with an 

equal a priori probability. However, for complex molecules, 
it is known that this typically results in the generation of 
many highly improbable or energetically unlikely structures. 
In some situations up to 10^ wasted moves are generated for 
20 each useful move, a very considerable waste of processor 
resources. In contrast, the method of this invention 
generates proposed struct^ures according to an a priori 
probability depending on the current structure and the 
energetic cost of the new structure. This bias toward more 
25 acceptable structures of lower energy avoids generating 
highly improbable structures, making a veiy much more 
efficient use of processor resources. Because detailed 
balance must be satisfied, the acceptance probability of the 
conf igurationally biased method must include factors in 
30 addition to the usual Boltzman factor. A reference applying 
a similar method for simple linear alkanes is Smit et al . , 
Computer simulations of the eneraetice and siting of 
alkanes in zeolites, j. Phys. Chem. vol 98, pp 8442 et seq. 
(1994), which is herein incorporated by reference. 
35 At Step 93 the processor evaluates the energy, or 

Kamiltonian, of the proposed configuration. The Hamiltonian 
contains two groups of terms: conventional physical energy 
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terms, and heuristic constraint terms. Conventional terms 
include the energies of rigid unit torsional rotations and of 
Lenard- Jones, electrostatic interactions, and H-bonding 
between atoms in different rigid units. Bond lengths and 
5 angles are assumed fixed at the temperature of interest and 
their energies constant. These conventional interactions are 
exclusively intramolecular; no physical intermolecular 
interaction effects are considered in this invention. 
References for the conventional energies are Weiner et al . , 

10 An all atom force field for simulations of proteins and 
nucleic acids , J. of Computational Chem., 7:230-52 (1986); 
and Weiner et al . , A new force field for molecular simulation 
of nucleic acids and proteins , J. Amer. Chem. Soc. 106:765 
(1984) (herein referred, to^^^ as- the "AMBER fe^ferences" ) , which 

15 are herein incorporated by reference. 

Another important aspect of the Monte Carlo method of 
this invention is the heuristic terms: the consensus term and 
the measurement constraint term. They uniquely make use of 
partial information on the binder peptides to guide the Monte 

20 Carlo simulation. The consensus term, Hco„,en«us/ is added to 
the Hamiltonian to represent that all the binders do in fact 
bind to the same protein target in the same physical and 
chemical manner. Since binding occurs at the shared 
candidate pharmacophore in each binder, this term ma)ces 

25 energetically unfavorable moves that cause the geometric 
structure in the shared pharmacophore to depart from an 
average, common structure. Pseudo chemical *'bonds" to this 
average structure are added which mimic the actual physical 
bonding to the surface groups of the protein target. If the 

3D candidate pharmacophore is in fact the actual pharmacophore, 
this energy will become minimized and small in the 
equilibrium configuration, since there will be an actual, 
shared, geometric configuration. If the candidate 
pharmacophore is not the actual one, this term will not 

35 become minimized or small, as there is no physical reason for 
this region of the peptide molecul s to share a common 
structure. This is the only Hamiltonian term which couples 
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the N binders together; no physical intermolecular effects 
are considered. The binders are otherwise treated 
independently by the method. 

The measurement constraint term, H^, is added to 
5 represent the distance measurements made, which are in fact 
actual distances in the molecules and constrain any simulated 
structure. This term makes energetically unfavorable, by 
adding pseudo chemical bonds of the measured lengths, moves 
that cause the constrained internuclear distance to depart 

10 from their measured values. Of course if no partial distance 
measurements have been made or are otherwise available, this 
term may simply be omitted from the Hamiltonian without 
adversely affecting the practice of this step. Which 
measure^ents^^ to make^ if any , _is^ guided, by .the results- of the 

15 consensus structure determined. If an adequate structure can 
be obtained without assistance of distance measurements, none 
need be incorporated. If inadequate results are obtained, 
additional iterations of the method will need distance 
measurement inputs . 

20 Step 94 tests the proposed structure against an 

acceptance probability, accept (curr->prop) . This acceptance 
probability is determined by the energy of the proposed 
structure previously computed in step 93. If the proposed 
structure fails this test and is not accepted, the method 

25 progresses immediately to step 96. If the proposed structure 
meets the test and is accepted, the accepted proposed 
structure replaces and becomes the current structure. The 
proposed structure of this peptide is also saved (given 
certain other conditions detailed later) in a separate memory 

30 store of structures for later analysis. This structure store 
is preferably on disk. 

Repeated application of the concerted rotation may lead 
to a slightly imperfect structure, due to numerical precision 
errors. In an alternative embodiment, peptide geometry would 

35 be restored to an ideal state by application of the Random 
Tweek algorithm after several thousand moves (Shenkin et al . , 
1987, Biopolymers 26:2053-85) . 
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Seep 36 tests whether enough structures of equilibrated 
total nergy have been generated in this simulation run. The 
run terminates if a sufficient number have been generated. 
Sufficiency is determined on the basis of whether the 
5 statistical sampling errors of the average pharmacophore 
structure determined at step 97 is adequate (typically, less 
than 0.25 A). Preferably, 25,000 equilibrated structures 
would be accumulated for each run. Also, preferably, three 
runs would be performed for a total of 75,000 saved 

10 structures. 

Fig. 9 illustrates energy equilibration of an actual 
run. Axis 101 is the total energy of a set of peptide 
binders; axis 102 is the number of moves accepted. Traces 103 
represent total energies of all binders from each of the 

15 three runs. Typica^y^ ruiL^energy^rapidly equilibra^^ ~ 

^ Within less than approximately 2000 moves in most cases. 
Subsequent saved structures are counted toward termination. 
Traces 103 display typical energy variations superimposed on 
a secular stability. The illustrated energy variations 

20 typically comprise several components having different 
variabilities. First, there is a very high frequency 
oscillation with a period of a few tens of moves {known as 
"hair"). Second, there is a low frequency oscillation with a 
period of several hundred to a few thousand moves and with 

25 low amplitude. 

Step 97 analyzes the structure stored in memory. In the 
simplest preferred embodiment, the stored geometric 
structures for each binder are simply averaged, yielding a 
final structure for each binder and for the candidate 

30 pharmacophore. In another alternative, clustering software 
seeks clusters of similar structures for each binder. The 
clusters are then averaged to give a final structure for each 
variant structure for each binder. The variants represent 
alternative foldings for the binder. Exemplary clustering 

35 methods are found in Gordon et al. Fuzzv cluster analysis of 
molecular dynamics trajectories . Proteins: Structure, 
Function and Genetics 14:249"2€4 <1992) . 
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Alternative post -processing can be done on the clustered 
structures to account for small bond angle vibrations. Such 
vibrations are expected to make small perturbations to the 
clustered structures determined by the Monte Carlo method and 
5 can be accounted for by a brief molecular dynamics 
simulation. Such a simulation is fully defined by the 
Hamiltonian, comprising the physical and heuristic energies 
to be described infra in Eqn. 8, and by the temperature of 
interest. The structures observed during the simulation are 
10 averaged to determine a final more accurate equilibrium 

structure. A code capable of performing such a simulation is 
Discover® from BIOSYM (San Diego, CA) . Preferably, the 
molecular dynamics simulation would be run for approximately 
10* bond angle vibration periods. Since the typical bond 
J-! .^P9Ae_-Yib.ration„period--i l-O-^-ps- ir ps" =~ro^"" iic^^ 7,' such a 
run will encompass approximately 1 ns of molecular time. 

Conf iaurati onal bias move generation details 

One Type 1 or II move will, in general, alter the 

20 position of several -rigid units on a side chain or along the 
backbone. Each altered rigid unit is sequentially considered 
during move generation. The Hamiltonian describing the 
energy of the rigid unit currently being considered in a move 
is divided into an internal, u»"\ and an external, u««, part, 

25 where u'" is all energy not included in u*"'. In the preferred 
embodiment, u'« is set to 0; an alternative choice would be 
to include only the torsional interaction energy between this 
rigid unit and units to which it is currently bound, u^' 
generates a probability distribution, p^°S according to which 

30 is generated a set, ff,^, k - 1...K, of candidate torsional 

angles for the bond between the rigid unit being examined and 
rigid units already examined. u*« generates another 
probability distribution, p««, according to which is selected 
one torsional angle from the prior set as the proposed new 

35 angle for the rigid unit being examined. These probabilities 
are defined by the equations: 
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pf' (d),.;,) «exp [ -Pu/"' ^) ] 

r-ft ,- exp[-p^/-«t.,.,)] 



10 

In this equation, signifies the rigid unit being 

considered, K is the total number of candidate torsional 
angles generated by p^"^ and 0 = 1/kT (k is Boltzman's 
constant; T the temperature, preferably 37 ^C) . The overall 
probability of generati^^^^ the current t<5^ 

the proposed structures and accepting the proposed structure 
are given by the equations: 

Picurz-pzop) «n p!"' (<l>,.ppr= (4),..,' 

20 H 



-15 



^rneu^JI ^,ext (7) 



accept (cur r-pr op) =min(l, -^^ ) 



25 



In this equation, M is the total number. of rigid units added 
in the move. VT" is a weight for the reverse move and will 
be described subsequently. 

Because energy is included in the generation 
probabilities, proposed structures are preferentially of 
lower energy. Since the acceptance of proposed structures 
depends on their energies, the acceptance of proposed 
structures is thereby more probable. 
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pentide memory representa tion details 

It is well known that at body t mperature peptides 
consist of linked rigid units capable only of torsional 
rotational about mutual bonds whose lengths and angles are 
5 fixed. The torsional rotations respect any molecular 

conformational constraints. See Cantor et al . , Biophysical 
chemistry part I the conformation of biological 
macromolecules . New York, W,H, Freeman and Co. (1980), which 
is herein incorporated by reference. Table 2 lists the rigid 
10 units encountered in the preferred embodiment of this 

invention utilizing libraries of conf ormationally constrained 
peptides. Table 2, where applicable, also lists dihedral 
bond angles between incoming and outgoing bonds to a rigid 
unit and the assigned unit type. 

ir ^ ^ ^ ^ ^ ^ = - — - 

Table 2 



25 



30 



Type 


Chemical 
Structure 


Bond angle 
(if applicable) 


Backbone and side chain 
rigid units 


A 


-NH, 




B 


[ 

-CoH- 


70 .5«» 


C 


-CONH- 


70 .5» 


D 


-COOH 










Side chain only rigid units 


E 


-CH,. 


70.5' 


F 


1 

-CH- 


70.S«» 


G 


-S- 


70. 5« 


H 






1 I 


-CHj 




1 *^ 


-OH 




1 K 


-SH 
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5 



Typ 


Chemical 
Structure 


Bond angle 
(If applicable) 


L 


-NH 




M 






N 


-CONW. 




0 


-CN,H, 




P 


-CjNjHj 




Q 


-C,NHe 





Table 3 illustrates the decomposition of all amino acid side 
chains into rigid units. Glycine is a special case, without 
a side chain. Proline is a special case with a side chain 
cyclically bonded to the backbone amino N. 

15 



20 



25 



30 
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Table 3 



5 



10 



15 



25 



Amino Acid 


Rigid Units 


Glycine 


-CaHj- (SPECIAL CASE) 


Alanine 


-CH3 . 


Arginine 


-CHj-CHj-CHj-CNjH, 


Aspartate 


-CH2-COOH 


Asparagine 


-CHj-CONHj 


Cysteine 


-CHj-SH 


Glutamate 


"CH.-CHj-COOH 


Histidine 


-CH,-C3N,H3 


Isoleucine 


-CH(-CHO -CH.-CH, 


■ Leucine " 


-CHFCH(^CH3)7 ^ 


Lysine 


- CH2 - CHj - CH2 - CH J - NHj 


Methionine 


-CH2-CH2-S-CH3 


Phenylalanine 


-CH,-C,Hs 


Serine 


-CH2-OH 


Threonine 


-CH(-CH3) -OH 


Tryptophan 


-CHj-C^NHe 


Valine 


-CH(-CH3) -CH3 


Tyrosine 


-CH2-C«H,-0H 



Fig. 10 illustrates a structurally correct but 
geometrically inaccurate decomposition of the peptide 
backbone CXjC into rigid units (inessential hydrogens have 

3Q been omitted) . Rigid units are set off in boxes 121 and 
their types 122 are indicated. Fig 11 illustrates a 
structurally correct but geometrically inaccurate 
decomposition of the peptide backbone and side chains of 
-arginine-glycine-aspartate- ("RGD") into rigid iinits. Rigid 

35 units are set off in boxes 131 and their typ s 132 are 
indicated. 
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Rigid units are represented as records in memory. The 
data structure for a peptide comprises records for its 
constituent rigid units linked together by data pointers 
exactly as the actual rigid units in the peptide are 
5 chemically linked. The record representing a rigid unit 
comprises fields for: type of the unit, pointers to ' 
chemically bonded units, all atoms of the unit and their 
spatial positions, atoms of the unit that are the target of 
the incoming and outgoing bonds, amino acid to which the unit 
10 belongs, and atomic composition of the unit. 

A known, conventional representation of atoms and atomic 
interactions is taught by the AMBER references. Each atom is 
divided into a series of subtypes of specific properties. 
For example, for carbon there are subtypes C. C2, CA, CT. 
15 etc., for nitrogen, there are 1j. N2. etc.; for o^gen, there 
are O. 02, etc.- and for hydrogen, there are H. H2 , etc 
Bonds between each pair of subtypes are separately' 
characterized by equilibrium lengths, angles, and torsional 
energies. Interactions between each pair of subtype atoms 
20 are separately characterized by Lenard-Jones force 
parameters, hydrogen bonding force parameters, and 
electrostatic charges. Amino acid charge distributions are 
in Weiner et al . . j. of Computational Chem., 7:230-52 
(1986) . 

25 Thus each atom in each rigid unit is represented by an 

m-memory record comprising fields for: its AMBER reference 
subtype and any electrostatic charge. The atom's spatial 
position relative to its containing rigid unit, stored in 
that unit's record, is geometrically determined from the 

30 unit's internal chemical structure and bonds by the AMBER 

bond lengths and angles defined for each of these bonds The 
relative spatial positions of atoms within a rigid unit are 
of course, fixed, and there is no interaction energy to 
consider between atoms within a rigid unit. 

35 Fig. 11 ie a complete memory representation of a 

tripeptide sequence -RGD- (a known pharmacophore) . Rigid 
units are set off in boxes I3i and their types 132 are 
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indicated. The torsional degrees of freedom between the 
rigid units are indicated by angle arrows 133. AMBER atoms 
types are indicated as at 134 . Net atomic charges are 
indicated only for arginine as at 135. Rigid unit records 
5 are linked into a data structure modeling the rigid unit's 
physical linkages. Not shown are relative atomic spatial 
positions represented by the atoms rectangular coordinates. 

All parameters defining the AMBER atomic representations 
and interatomic forces can be found in Weiner et al., J. of 
10 Computational Chem. , 7:230-52 (1986), and Weiner et al., J. 
Amer, Chem. Soc, 106:765 (1984). Conventionally, these 
parameters are obtained from computer readable files from 
commercial sources. The preferred computer readable source 
of these parameters is from Insight II® 2.3.5 software from 
^ ^ -- - IS-BIOSYM (San -Diego /-CA)- ^ Other ^sources ar^ Tripos (StT Louis, 
MO) and CHARMm (Molecular Simulations, Inc., Burlington, MA). 

Interaction enerav evaluation details 

The form of the intramolecular energy, or Hamiltonian, 
20 evaluated at step 93, is an important element of this 
invention. The Hamiltonian consists of the components: 

^total ' ^ ^j, total 

Ithindezs ^ g ^ 

^J, total molecular'^ ^1, consensus 

25 



The Hi^^i.c^i^^ component is determined from the Weiner et al . 



references, J. of Computational Chem. 
J. Amer. Chem. Soc, 106:765 (1984) 



35 



7:230-52 (1986) , and 



30 



H 



h.moleculaz^ 13 ^^^^ +1) * T 



n;it 
rigid unit 
tozsionaJ 
anglms 



atom pairs 



H-tona pairs 



i.h. 
atom pairs 



I?" 



I?" 




(9) 
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Here, <t>. ,^ is the i'th torsional angle between rigid units of 
the I'th binder peptide, and Rj^, is the interatomic distance 
between the i'th and j'th atoms in different rigid units of 
the I'th binder. The first term in this equation is the 
5 torsional energy of rigid units; the second is the 

interatomic Lenard- Jones energy; the third is the" interatomic 
electrostatic energy; and the fourth is the interatomic 
hydrogen bond energy. Rigid unit torsional rotations 
directly change the first term. Such rotations indirectly 
10 change all other terms as interatomic distances change. 

The AMBER parameters V^^, A^., B,,, q^, c,^ and D,^ are 
obtained as stated above. The effect of water is 
approximated in a known manner by setting e equal to 4€o^' 
where r is distance (in A) in the electrostatic term and is 
IS^the-vacuum permeabilityT ^ ^ — — — - 

The distance constraint term, as described, makes 
energetically unfavorable moves which cause those measured 
interatomic separations in the simulation to depart from 
their measured values. If no measured values are available, 
20 this term is simply omitted from the Hamiltonian. Since this 
is not a physical energy and in simulation equilibrium the 
binders should have the measured distance, it is advantageous 
that this term should make only a small contribution to the 
equilibrium energy, no more than 10% of the total energy and 
25 preferably approximately 2,5 to 5%. Further, it is 

advantageous that the energetic disfavor be weighted by the 
confidence in the measurements, so that measurements having 
more confidence have a greater effect. 



Many forms of this energy meet these criteria. The 
30 preferred form is: 




(10) 
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where R*°'i ij is a measured distance in the I'th binder peptide 
between atomic pair i j . This makes the constraints appear as 
an elastic pseudo-bond with equilibrium length as measured. 
The Wi i3 are weights designed to meet the above size criteria. 
5 In the preferred embodiment, they are calculated with an 
overall multiplicative factor limiting the contrilDution of 
Hi.KMR to no more than approximately 5% of the total 
equilibrated energy. Their relative value is selected to 
reflect the lower reliability of longer measurements. Thus 

10 if R'^^'i.ij is between 0 and 3 A, w^.^j has a relative value of 1; 
if the measurement is between 3 and 4,5 A, the relative value 
is 2; if between 4.5 and 7 A, the value is 3; and if the 
distance exceeds 7 A, the term is dropped from the sum. 
Other alternative weight assignments meeting the general 

IS cri^teria^are clearly possible . - ^ 

The consensus constraint term, as described, makes 
energetically unfavorable moves which cause the candidate 
pharmacophore in each of the binders to depart from an 
average, shared configuration. In simulation equilibrium 

20 when the candidate is the actual pharmacophore, the binders 
share the pharmacophore structure and this term should be 
small. Since this is not a physical energy, in the case 
where the candidate pharmacophore is correct, this term 
should not be large compared to the total energy, in 

25 equilibrium no more than 10% of the total energy, and 

preferably approximately 5%. Further, the energetic disfavor 
should preferably be weighted by the affinity of each binder 
for the protein target, so that binders with greater affinity 
have a greater energetic effect. 

30 Many forms of this energy meet these criteria. The 

preferred form is: 

r 

_ (R^.a-Ri!')' (11) 

piurmocophoze 
diMtmnce p^izs 
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R*^*j^, the shared consensus structure for the candidate 
pharmacophore, is an average of the interatomic distances 
between corresponding atomic positions, ij, in the shared 
pharmacophore in all binders. This makes the constraints 
5 appear as a pseudo-bonds to a shared pharmacophore, which 
represents the binding to the protein target. The are 
weights designed to meet the above size criteria. In the 
preferred embodiment, they are calculated with an overall 
multiplicative factor limiting the contribution of Hi^con.en«us to 

10 no more than approximately 5% of the total equilibrated 
energy. Their relative value is selected to reflect that 
binders with lower affinity are less reliable indicators of 
actual pharmacophore structure. Thus the relative value of 
the weights is proportional to the logarithm of the affinity 

15 of the corresponding, binder with an aff inity of 1 /in^ol^^r 
having a relative weight of 1. Other weight assignments 
meeting the general criteria are clearly possible. The 
heuristic Hcpn,^^us is the only Hamiltonian term linking 
together the various binders . 

20 All Hamiltonian components change only due to the 

dependence of the interatomic distances, Ri.ij' on the rigid 
unit's torsional rotation. The are the well known 

Euclidean distances between the atomic coordinates stored in 
the rigid unit records. Calculation of coordinate changes 

25 due to rotation of angle ^ about a bond with unit direction n 
originating at atom A with position 2£ is well known, but will 
be detailed. (Throughout, symbols representing vector 
quantities are indicated by underlining.) First, translate 
from the current coordinate origin to an origin at position ^ 

30 by adding }c to all relevant coordinate vectors. Second, 

apply a rotation matrix, T, to the atomic coordinate vectors. 
Third, translate back to the prior coordinate origin from ;< 
by subtracting x from all relevant coordinate vectors. A 
rotation matrix is given by: 
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T=cos (<p) ^(1-cos (<p) ] +Msin(<p) 



0 Hy 



^2 



0 



(12) 



10 



15 



20 



25 



30 



35 



A reference for this computation is Goldstein, Classical 
mechanics , Massachusetts, Addison-Wesley (1981) , especially 
chapter 4, which is herein incorporated by reference. 

Type I move generation 

Type I tnoves alter side chain structure of a randomly 
chosen amino acid in a randomly chosen ^i^der. _ These randorr^_ 
choices are conventionally made by a random number 
subroutine. The chosen side chain is "removed" from the 
binder peptide and "grown" back rigid unit by rigid unit. 
For the next, i'th, rigid unit to be added, K possible new 
torsional angles are generated according to p^"*'. Preferably 
K is from 10 to 100. One of these torsional angles is 
selected according to p***, and the rigid unit is added at 
this new angle. Determination of p***^ requires obtaining the 
normalization w/**^ . At each step the u^"' and u*''^ used to 
calculate the respective probabilities include only 
interaction energies with rigid units present in other amino 
acids or already grown back. Rigid units not yet added are 
ignored. After all the side chain rigid tinits have been 
added back, VT*** is computed as the product of the 
normalization factors. 

Fig. 12 illustrates a Type I move for glutamate. At 141 
the side chain has been removed. The first -CHj- unit is 
added back at 142 with new torsional angle ^j. The generation 
according to p*"^ and selection according to p**' of this angle 
ignores energy interactions with the other side chain rigid 
\mits not yet added. At 143, the next rCHa- rigid unit is 
added back at angle 4>2' Finally at 144, the last -COj rigid 
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unit is added at angle ^j. For this last step interaction 
energies with all the rigid units are considered in 
generating and selecting the new angle. 

W°" is the weight for the reverse move, the move from 
5 the proposed new structure to the current configuration. Foi 
this, the proposed side chain is removed and regrown" in its 
current structure unit by unit. For the next, i'th, unit 
generate K-l possible new torsional angles according to p"*, 
again ignoring interactions with units yet to be added. The 

10 K'th new angle is the current angle for that unit. The 
current torsional angle is selected. Although p'« is not 
used, normalization w^'" is determined. After all units have 
been regrown at the current angles, Vv«" is computed as the 
product of the normalizations. 

15 - " The acceptance ■probability for the proposed side chain 
configuration is determined from equation 7 using W"'" and W"-"* 

Type II move generation 

Type II moves alter a limited region of the amino acid 
20 backbone beginning at ?. randomly chosen backbone rigid unit 
of a randomly chosen binder peptide in a manner consistent 
with conformational constraints due to internal disulfide 
bonds. These random choices are made similarly to those for 
Tyjie I moves. 

25 In Type II moves, side chains attached to the altered 

rigid units move rigidly with their backbone rigid units. 

For this move, important geometric constraints must be 
met. In a randomly chosen binder and at a randomly chosen 
backbone bond between adjacent rigid units, a torsional angle 

30 rotation by 0^ is made. Subsequent backbone torsional 

rotations are chosen so that a minimum number of rigid units 
undergo a spatial displacement. This constraint fixes a 
limited number (if any) of possible subsequent torsional 
angles as a function of <t>o so that at most 4 rigid units are 

35 spatially displaced and rotated with at most 3 additional 
rigid units undergoing a rotation. This move is an important 
aspect of this invention and is required to maintain th 
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conformational constraint due to the disulfide bridge. Since 
only 7 rigid units are spatially modified, the Type II move 
preserves the 8 amino acid cycle (20 rigid units) , including 
the cystine side chain. 
5 Fig- 13 illustrates a Type II move of a poly-glycine 7- 

mer. Rigid unit positions are indicated generally by black 
circles as at 1509 with incoming bonds generally as at 1502. 
A rigid unit (B unit) is illustrated .in box 1515, and an 
amide bond (C unit) in box 1516, Backbone structure 1500 in 

10 transformed into structure 1501 by the Type II move generated 
by an initial rotation about bond 1502. Subsequent rotations 
about bonds 1503, 1504, 1505, 1506, 1507, and 1508 are 
thereby determined so that the rigid unit 1510 and at most 
three subsequent units undergo only a rotation without any 

15 spatial displacement. The four rigid units between units 
150,9 and 1510 undergo both a spatial displacement and a 
rotation as structure 1500 is transformed to structure 1501. 
No other backbone rigid units are altered. 

The derivation of these assertions, including 

20 expressions for the- allowed angles, is in Section 8. 

Appendix: Concerted Rotation. Fig. 14 defines notation used 
in this Appendix: Concerted Rotation. Poly-glycine 7-mer 
backbone 1600 is the same as in Fig. 13. Rigid unit 
positions are indicated generally by black circles as at 1601 

25 with incoming bonds generally as at 1602. The torsional 

rotations to 4>^ are about bonds 1602 to 1608, respectively, 
between sequential, adjacent rigid units. The rigid unit 
position vectors £o to I4, illustrated as vectors 1610 to 
1616, respectively, define the position of these sequential 

30 rigid units with respect to a laboratory coordinate system 
with origin 1609. Summarizing this Appendix, the 
determination of the fixed torsional angles proceeds as 
follows. The allowed values for are the roots of equation 
34, which depends on the 0© driver angle and through <t>^. 

35 But ^2 through can be determined in terms of Two 

solutions for tp^ a^re determined by equation 25 in terms of tp^. 
Two solutions for ^3 are determined by equation 29 in terms of 
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the preceding <t>'s. Finally, a simple inversion of equation 
32 determines one solution for 4>^ in terms of the preceding 
</)'s. Having found the allowed values of then equations 
25, 29, and 32 determine corresponding allowed values for the 
5 other which in turn determine the alteration of the 

first four rigid units caused by the (pQ initial rotation. 

More precisely, final torsional angles <^>o to <t>^ determine 
position vectors to by applying rotation matrix 18 to 
equations 17 to obtain new position vectors in the laboratory 
10 coordinate system, the rotation matrices of equations 16 and 
18 being determined by these final torsional angles. 
Position vectors £c and £5 to r, do not change. Then rigid 
unit 0 is translated to position r^,; aligned so that its 
incoming bond axis is along the direction of the outgoing 
. IS^bond of^unit -1; and f inally rigidly rotated s^ that ^ he end 
of its outgoing bond is at position Rigid unit 1 is then 

translated to position £j; aligned so that its incoming bond 
axis is along the outgoing bond of unit 0; and rigidly 
rotated so that the end of its outgoing bond is at position 
20 £2. Rigid units 2 to 6 are then added to the backbone in a 
similar fashion. In this fashion the Type II move geometry 
is determined. Any side chains attached to these rigid units 
are rigidly rotated when their parent unit is rotated. 

The Type II rotation is chosen in the following manner. 
25 Using the conf igurational bias prescription, the Hamiltonian 
is divided into u^"^ and u"^ u*"' is preferably 0, or 
alternatively is the torsional energy associated with the 
rigid unit of interest, while u*« includes all remaining 
interaction energies. In the previous manner, u****^ determines 
30 p^'^ according to which are generated K' candidate ^0 rotation 
angles. Preferably K' is 1. Then the geometric constraints 
are solved for each candidate Typically, but not always, 

6K' , denoted K, possible backbone alterations are obtained. 
One of these is selected by p*", determined by: 
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exp[-pur^((t>,.^)) 



(13) 



10 



15 



includes all interactions not in u^"', that is all other 
backbone and side chain interactions. Because these 
determinations occur in torsional angle space and change the 
volume element in that space, the Jacobian, determined by 
equation 35, of the selected Type II move is also needed as a 
weight in the acceptance probability for detailed balance. 
This acceptance probaMl^ty_for^pe II mwes^ is: _ 

accept (cuTz-pzop) = inin[l, ^^^^joxd ^ * ' 



The weight and Jacobian of the reverse transformation 
from the proposed to the current structure are also needed in 
the acceptance probability for Monte Carlo detailed balance. 
These quantities are determined as follows. Using the 
proposed backbone structure just selected as the basis, 
generate a set of K'-l new torsional angles according to 
pi« and also include the current in the set. Then solve 
the geometric constraint to determine the permitted 
alterations. The current configuration, since it exists, 
must be among the permitted structures. From this set of 
permitted structures determine W»" per equation 13. Then 
select the current configuration and compute the Jacobian J**" 
per equation 35. This completes the determination or the 
acceptance probability. 

Proline is approximated. Proline is not sxibject to Type 
I moves. However, proline is subject to normal Type II 
moves, with its side chain bond to the amino nitrogen broken. 
The side chain thus moves rigidly with its backbone rigid 
unit as in normal Type II move. To compensate for the broken 
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bond approximation, the C,-N torsional energy amplitude in the 
proline backbone is set at approximately 5 kcal/mole. (By 
contrast the torsional energy in a typical amino acid of the 
C„-N bond is approximately 0.3 kcal/mole.) This invention is 
5 adaptable to other suitable approximations for proline. 
Alternatively, the proline side chain may be subject to 
alterations which preserve its cyclicity, such as for 
example, by an extension of the constraint scheme just 
described. 

10 

Program d etailed description 

The following describes the construction and use of a 
computer method and apparatus to perform the method of step 
5. The listing of this code is included in a microfiche 
15 appendix to this specification. Fig. 15 is a general view of 
the computer system and its internal data and program 
structures. To the left in Fig. 15 are the principal data 
structures of this method. Current structures 1701 contains 
the current structures of the N binders represented in memory 
20 as described. Proposed structure 1702 contains working 
memory areas used to generate a proposed new structure for 
one binder peptide. Structures 1701 and 1702 would typically 
be stored in RAM memory of the computer system, RAM memory 
being memory directly accessible to processor fetches. 
25 Stored structures 1703 contain similar memory representations 
of all the peptide structures generated, accepted, and 
selected for storage. This is typically stored on permanent 
disk file{s) . 

Candidate pharmacophore structures 1704 are input to the 
30 programs from either a disk file of the. display and input 
unit 1712. The identified candidate structures are used to 
determine the w'j in Egn 11. 

Parameters 1705 comprises several parts. First, are all 
the AMBER atomic interaction definitions and parameters. 
35 Second, are standard representations of the amino acids 
including component rigid units and atomic charge 
assignments. Third, are parameters controlling the run. 
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These further comprise, by example, values for K and K' , the 
Type I/I I move branching ratio, the number of moves made in 
the simulations run, the simulation total energy record, etc. 
The parameters would typically be loaded from disk file(s) 
5 into RAM memory for manipulation during a simulation run. 
Unit 1712 includes display and input devices' for 
monitoring and control. Depicted on the display are the 
total number of moves made in the current run and the course 
of the total energy, which is similar to that illustrated in 
10 Fig. 9. 

Processor 1711 is loaded with necessary programs prior 
to a simulation run and executes the programs to perform the 
simulation method. The general structure consists of main 
program 1706, structure modification program 1707, Type I and 

15 II -move- generators 1 7 0 8 =and 17 0 97 and subrou t ine s^l 710. The 
subroutines consist of common utility subprograms, such as 
for performing torsional rotations about bonds and computing 
interaction energies by the previous methods, and 
conventional library subprograms, such as for perfoirming 

20 input and output and finding random numbers. Any 

scientifically adequate random number generator can be used. 
A reference for random number generators is Press et al . , 
Numerical recipes: the art of scientific computing . 
Cambridge, U.K., Cambridge University Press, (1986), chapter 

25 7. The invention is equally adaptable to other program 
structures that will occur to those skilled in computer 
simulation arts. 

The preferred embodiment of these structure -is an Indigo 
2 workstation from Silicon Graphics (Mountain View, CA) . 

30 Alternatively, any high performance workstation, such as 
products of Hewlett-Packard, IBM or Sun Microsystems, could 
be used. Preferably the data and program structures are 
coded in the C computer language. Alternatively any 
scientifically oriented language, such as Fortran, could be 

35 used. Conventional subroutine and scientific subroutine 
libraries are used where appropriate. 
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The program components will be now described in detail 
with reference to Figs. 16, 17, 18, and 19. Pig. ig 
illustrates main program 1706. The peptide sequences of the 
N binders are input at step 1801. All necessary AMBER 
5 parameters - bond lengths and angles, atomic types and 

charges, interaction parameters, amino acid definitions, etc. 
- are input at step 1802. Step 1803 creates initial 
structures from this input data. Rigid unit records for all 
rigid units are created and linked to represent peptides. 
10 The geometric structures of these peptides either are 

obtained from a prior run or are built by adding side chains 
to a prototypical backbone characteristic of the library of 
the binder. A prototypical backbone for the CX,-C library is 
found in the microfiche appendix heading CX6C.CAR. The 
. ti'lder structures are stored in the current itFucture 

data areas in preparation for the beginning the main steps of 
the method. 

Step 1804 begins the main loop of the simulation with 
the generation of a proposed modified structure for one of 
20 the binder peptides by structure modification program 1707. 
As part of proposed structure generation, an acceptance 
probability, accept (curr- >prop) is determined as previously 
described. The proposed structure will be accepted at 1805 
based on this probability. For example, a random number 
25 between 0 and 1 is generated, and the proposed structure 
accepted if the random number is less than the acceptance 
probability. If the proposed structure is accepted, then it 
is tested for sufficient distinctiveness at step- 1806. This 
test is met if at least one atomic position in the proposed 
30 structure differs from the corresponding position in the 
current structure by at least approximately 0.2 A. If the 
proposed structure is distinct, i-t is stored at 1807 in the 
structure store for later analysis. Whether distinct or not 
the accepted proposed structure for the peptide replaces the' 
35 corresponding current structure at step IBOB. 

The simulation is tested for completion at step 1809 
completion can be controlled by the operator at station 1712 
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depending on display of run progress results. Alternatively, 
termination can be mechanically controlled. After completing 
a certain number of total moves after run energy 
equilibration, the moves being split between Types I and II 
5 according to the specified branching ratio, the run is 
terminated. The preferred number of total moves is 25,000, 
and the preferred Type I/II branching ratio is 4. Thus it is 
preferred to have 20,000 Type I and 5.000 Type II moves after 
equilibration per simulation run. 
10 At step 1810, the stored structures are analyzed to 

determine both the consensus pharmacophore structure and the 
structures of the remainder of the binders. In the preferred 
embodiment, atomic positions in the equilibrated stored 
structures for each peptide are averaged to obtain the 
3! P^jl^^^Jl Sepnietric. structure-. -The shared pharmacophore ° ' 
structure is obtained from the predicted structure of each 
peptide, again by averaging the shared position information 
for all peptides. Alternatively, before structure averaging, 
the structures generated for each binder can be clustered 
20 into similar groups ^nd the clusters for each peptide 
separately averaged. The clusters would represent 
alternative peptide folding patterns. It is anticipated that 
because preferred binders are short peptides constrained by 
disulfide bridges, any alternative foldings identified will 
25 be structurally similar. The clustering can be done by the 
exemplary methods found in the previously referenced article 
Gordon et al . Fyggy clygter analvsis of mr>l>»m ^iar dvnanii^ c 
trajggtQrigS . Proteins: Structure, Function, and Genetics 
14:249-264 (1992). For all analysis methods, the choice of 
30 the preferred number of stored moves is adjusted to achieve 
adequate estimated statistical position errors. Further, 
preferably, the results of three runs are combined to achieve 
increased statistical confidence. 

Other information is also output. Particularly 
35 important is the course of the total energy for each peptide 
and for all the peptides, and the intra ^molecular, consensus, 
and constraint components of the energies. These energy 
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components are used in determining whether a consensus 
pharmacophore has been found. As previously described, this 
is preferably done by insuring that H,_.„.„ is small compared 
to the total energy and is minimized by a particular 
5 candidate pharmacophore. Also must be relatively small. 

Finally at 1811, all results are output in a' form usable 
for the subsequent steps 6 and 7 of Fig. i. For example, 
this may be a particular file format suitable for subsequent 
lead compound search by a database query. 
10 Turning now to Fig. 17, structure modification program 

1707 will be described. This is invoked from the main 
program at 1804. Upon entry, this program randomly picks one 
of the binder peptides at 1901 for which to generate a 
proposed structure and also picks which type of move to use 

-^^f l- I^^f latter^r^ndom choice is made according in ^ 
adjustable Type I/II branching ratio (preferably 4). For a 
Type I move, step 1903 picks a random amino acid side chain 
of the selected peptide, and step 1904 invokes the Type I 
move program. (Proline has no Type I moves.) For a Type II 
20 move, step 1905 picks a random backbone bond between rigid 
units to rotate and also a random direction from the picked 
bond along which backbone rigid unit structure will be 
altered. Step 1906 invokes the Type II move program. 

Figs. ISA and 18B illustrate the Type I move generator 
25 1708, which is defined by equations 6 and 7. With reference 
first to Fig. 18A, the proposed structure of the selected 
peptide is created from its current structure by removing the 
selected side chain. All intra-molecular interactions are 
subsequently determined with respect to the proposed 
30 structure absent side chain rigid units not yet regrown. K 
candidate new torsional angles for the next, i'th, rigid unit 
to add are generated by p,^' at 2002. Preferably K is between 
10 and 100. Generation of these angles uses the conventional 
rejection method referenced in Press et al. at § 7.3. The 
35 weight w,-« and are determined for each of these 

candidate angles. This requires the rigid unit to be added 
to be rotated to the candidate angle using the previous 



91 - 



wo 96/30849 



PCrADS96/04229 



rotation method. Candidate interaction energy is determined 
from candidate interatomic distances resulting from the 
candidate rotation. One of the candidate angles is 
probabilisticly selected at 2003 and the rigid unit added 
5 back at this torsional angle at 2004. If there are more 
units to add, which is tested at 2005, these steps are 
repeated. If not, the acceptance weight W"'" is determined as 
the product of the w/*^ at 2006. Lastly the old weight is 
determined at 2007. From the weights the move acceptance 

10 probability is found for use at 1805. 

Fig. 18B details the determination 2007 of W**^**, the 
weight for the reverse move from the proposed to the current 
side chain structure. Temporarily the proposed structure is 
used as a basis for energy determination at 200B, and then 
=15 the -current- st rs =restored ^at^ 20116 7 when ~t process 

is finished. The proposed side chain is removed at 2009 for 
regrowth rigid unit by rigid unit as in Fig. ISA. For the 
next, i'th, rigid unit to be added back, K-1 candidate angles 
are generated according to p/"^ at 2010 with the current value 

20 of that angle for the K-th candidate at 2011. As previously, 
the weight w/*"" is determined for these candidate angles at 
2012. The rigid unit is added back at the current, K-th, 
angle at 2013. If there are more units to add, tested at 
2014, these steps are repeated. If not, the acceptance 

25 weight W°^** is determined as the product of the w/*"" at 2006. 
Figs. ISA and 19B illustrate Type II move generator 
1709. which is defined by equation 13 and 14 and the 
concerted rotation geometric constraints. With reference to 
Fig. 19A, K' candidate new torsional angles for the selected 

30 backbone bond are generated by p^"*^ using the rejection 
method. Preferably K' is 1 , Torsional rotations about 
adjacent backbone bonds, in the selected direction along the 
backbone, permitted by the concerted rotation constraints are 
determined from the roots of equation 34 at 2102. Equation 

35 34 depends on intermediate variables obtained from equations 
25, 29, and 32 and determined in that order. The roots are 
simply found by searching the interval [-tt^tt] in 0.04® 
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increments. When a root is located in a 0.04** segment, it is 
refined with the bisection method referenced in Press et al . 
at § 9.1. It is expected on the average that six K' 
solutions will be found. If no roots are found at 2103, the 
5 candidate rotation is impossible and this move is skipped. 
If solutions exist, next, at 2104, p'**^ and W"*"" are' determined. 
Using the described rotation method, the backbone rigid units 
are rotated (with consequent spatial displacement of 4 units) 
to a candidate torsional angle solution about their mutual 

10 bonds. Additionally, any side chains attached to backbone 
rigid units are rigidly rotated using the same method. 
Having made these rotations, candidate interatomic distances 
and candidate interaction energies can be determined and used 
to obtain p*" for this candidate solution. One of the 

15 candidates is pr^babilistiG-ly- selected^at ^2104 ,~ and the 
backbone and any side chains are rotated according to this 
candidate into the proposed structure. The Jacobian of this 
transformation is determined at 2106 by equation 35. Lastly 
the old acceptance weight and Jacobian are determined at 

20 2107. From the weights and Jacobians the move acceptance 
probability is found for use at 1805, 

Fig, 19B details the determination 2107 of VT*^ and J°^** 
for the reverse move from the proposed to the current side 
chain structure. Temporarily the proposed structure is used 

25 as the basis for .energy determination at 2008, and the 

current structure is restored at 2016, when this process is 
finished. At 2109, a set of K' -1 candidate torsional angles 
is generated for the selected backbone bond according to p^"^ 
using the rejection method and the current torsional angle is 

30 added to this set. If as preferred, K' is 1, this step 
results in a set with only the current angle. At 2111, 
similarly to 2102, the permitted torsional rotations about 
adjacent backbone bonds are determined from the equations 
expressing the concerted rotation constraints. Special care 

35 is taken to ensure that the original conformation is found by 
the root finding procedure. In particular, the search 
interval is centered on the known original 4>i and is made as 
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small as necessary to isolate the root, which may be as small 
as 0.004° or smaller. The current structure must be among 
these solutions, since it exists. Select it at 2112. W*'^^ is 
computed from the candidate angle solution, making the 
5 candidate rotations and determining candidate interactions. 
Also the Jacobian, J°", of the transformation is computed 
from the proposed to the current structure. 

5.8. CONSENSUS STRUCTURE TEST 
10 Having selected a candidate pharmacophore and determined 

a best possible consensus structure and best possible 
structures for the remainder of the binder molecules, the 
consensus test, step 6, tests whether a CGnsensus structure 
has actually been found. A consensus pharmacophore structure 
__ 15 consists, of a-spatial-arrangemeht^ of Irhemic^^ 

groups shared by all the N binders to high accuracy. Since 
an actual pharmacophore exists, the N specifically binding 
members of the screened libraries will share the actual 
structure. However, the remainder of binder molecules will 
20 share no other similar structures to such a high accuracy. 
Therefore, a structure consensus of the N binders is possible 
only if the candidate pharmacophore is the actual physical 
pharmacophore responsible for the actual binding. If the 
candidate selected relates to other parts of the binder 
25 molecules, no structure consensus will be found. Further, if 
the Monte Carlo determination attempts to impose a consensus 
on parts of the binder molecules that do not share structure, 
an inconsistent overall structure will be obtained for the 
remainder of the binder molecules. 
30 Therefore, two preferred consensus tests are applied: 

one test asks whether a consistent candidate pharmacophore 
has been obtained, and a second test asks whether consistent 
structures have been obtained for the remainder of the binder 
molecules. Both tests have a preferred absolute and a less 
35 preferred relative version. 

There are two portions for the first test. First, are 
all the consensus pharmacophore distances obtained in the N 
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binders within at least a specified distance, preferably 
approximately 0.25 A, of each other? Second, is the 
consensus energy, H,,„„„.„„ relatively small compared to the 
total molecular energy (e.g.. less than at most approximately 
5 5-10% of the total molecular energy) as determined by the 
Monte Carlo method? 

There are also two portions of the second test. First, 
can the intramolecular distances predicted by the Monte Carlo 
method be confirmed by additional distance measurements? 
10 Second, since the Monte Carlo method utilizes distance 
constraints previously measured, one or more of these 
measurement constraints can be ignored and the predicted 
distance checked against that measured distance. Tolerances 
for these tests are distance agreements of at least specified 
^^f I'^f ^_^i'_.e-g^.,_approximately-o.5 -At 'in each binder. ^ ^ ' 

The two preferred tests have been described in the 
absolute version as requiring checks against absolute 
tolerances. Alternatively, the values of the pharmacophore 
distance differences among the binders, H,^._„.. and the 
20 differences of the predicted and measured distances can be 
accumulated for all the possible candidate pharmacophores, 
the candidate selected being that one minimizing these 
departures. Therefore, the selected candidate will have the 
minimum values for the differences of the pharmacophore 
25 distances in the binders, the minimum value for H 



the minimum values of the differences of predicated from 
measured distances. 



and 



This invention is adaptable to other tests that evaluate 
the consistency of the consensus structure obtained for the 
30 candidate pharmacophore and the accuracy of the structure 
obtained for the remainder of the binder molecules. 

LEAD COMPOUND PffTg RMiyATJp M 
Having started at step 1 with a target of interest, upon 
35 completion of step 6 of Fig. i a high resolution 

pharmacophore structure has been determined as well as 
supporting structures of the N binder peptides. This high 
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resolution structure is used in step 7 to determine lead 
compounds for use as a drug that will bind to the original 
target of interest. 

Thus, one or more lead compounds are determined, that 
5 share a pharmacophore specification with the determined 

consensus pharmacophore structure. This determination can be 
preferably done by one of several methods: by a search of a 
database of potential drug compounds or of chemical 
structures (e.g., the Standard Drugs File (Derwent 

10 Publications Ltd., London, England), the Bielstein database 
(Bielstein Information, Frankfurt, Germany or Chicago) , and 
the Chemical Registry (CAS, Columbus, OH)) to identify 
compounds that contain the pharmacophore specification; by 
modification of a known lead compound to include the 

15 pharmacophore specification ;__by__synth - - - 

structure containing the pharmacophore specification; or by 
modification of binders to the target molecule (e.g., 
isolated in step 2) outside of the pharmacophore structure to 
render the binder more attractive for use as a drug (e.g., to 

20 increase half-life, .solubility, ability to achieve desired in 
vivo localization) . 

Database search queries are based not only on chemical 
property information but also on precise geometric 
information. Computer-based approaches rely on database 

25 searching to find matching templates; Y.C. Martin, Databasg^ 
searchin g in drug design , J. Medicinal Chemistry, vol. 35, pp 
2145-54 (1992), which is herein incorporated by reference. 
Existing methods for searching 2-D and 3-D databases of 
compounds are applicable to this step. Lederle of American 

30 Cyanamid (Pearl River, New York) has pioneered molecular 

shape -searching, 3D searching and trend-vectors of databases. 
Commercial vendors and other research groups have enhanced 
searching capabilities (MACSS-3D, Molecular Design Ltd. (San 
Leandro, CA) ; CAVEAT, Lauri, G. et al.. University of 

35 California (Berkeley, CA) ; CHEM-X, Chemical Design, Inc. 
(Mahwah, N.J. ) ] . 
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The pharmacophore structure determined in this invention 
is adaptable to any of these methods and sources of chemical 
database searching and to the enumerated non-database 
methods. Output will be lead compounds suitable for drug 
5 design. An important aspect of this invention is that the 
high resolution pharmacophore structure will lead*to highly 
targeted leads. Lower resolution structures result in a 
geometric increase in the number of lead compound query 
matches. Example 1 illustrates this effect. 

10 

5.10. APPENDIX; CONCERTED ROTATTnw 

Since the preferred molecules under consideration are 
conformational ly constrained by disulfide bridge (s), a Monte 
Carlo move that preserves this constraint is required. The 
15 concert ed^ rot a t ionj- sj:heme ^used. f or -alkanes- can-be-extenaea 
'^~^^'^iow rotation of the torsional angles in conf ormationally 
constrained peptides. This appendix describes this 
extension. Dodd et al . (1993) discusses the original, 
restricted method. (The essential extensions are expressed 
20 in equations 27, 28, and 34.) This method is directly 
applicable to the cyclic residue of proline, and an 
alternative embodiment of this invention would thermally 
perturb proline with a move of similar geometric constraints. 
Fig. 14 illustrates the geometry under consideration. 
25 Illustrated backbone 1600 is a poly-glycine 7-mer. Rigid 
unit positions are indicated generally by black circles as at 
1601 with incoming bonds generally as at 1602. The torsional 
rotations to are about bonds 1602 to 1608, respectively, 
between sequential, adjacent rigid units. The rigid unit 
30 position vectors z.o to illustrated as vectors 1610 to 
1616, respectively, define the position of these sequential 
rigid units with respect to a laboratory coordinate system 
with origin 1609. A C„ rigid unit (B unit) is illustrated in 
box 1630, and an amide bond {C unit) in box 1631. 
35 To formulate this method, let us consider rotating about 

seven torsional angles, which will displace the root 
positions and rotate four rigid units, rotate up to three 

- 97 - 



wo 96730849 



PCT/US96AW229 



additional ones, and 1 eave the rest of the peptide fixed. 
The root position of a rigid unit is the position for a B 
unit, the C position for a C unit, the C position for a CHj 
unit, .and the S position for the S unit in cystine. If unit 
5 5 is a C unit, however, £t is defined to be the backbone amino 
nitrogen position of that unit. For each unit, Iht us define 

to be the fixed angle between the incoming and outgoing 
bonds. Thus, G, = 0 for a C unit, and 9^ - 70.5® for all 
others . 

10 The method leaves the positions £i of units i ^ 0 or i ^ 

5 fixed. The torsion 4>^ is changed by an amount 6^^,. The 
values of 1 < i < 6 are then determined so that only the 
positions r^ of units 1 < i < 4 are changed. 

The method requires several definitions to present the 

1 5 solu t ior^ f pr_ t he_new .t or s iona 1^ angles . - The=^bond vec t ors^are" 
defined to be the difference in position between unit i and 
unit i - 1, as seen in the coordinate system of unit i: 

i, (15) 

20 

Bond vectors to are illustrated in Fig. 14 at 1620 to 
1624, respectively. The length and orientations of the are 
determined by rigid unit structure and the length and angle 
AMBER parameters for bonds between atom types. The 
coordinate system of i is such that the incoming bond is 

along the £ direction. Thus ii « 1^ i if atoms r^ and r^.^ 

are directly bonded to each other and has and y- components 

30 ^ . - . 

otherwise. Here 2f is a fixed unit vector along the x 

direction. Now define a rotation matrix that transforms from 
the coordinate system of unit i+1 to unit i 

35 
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cos6. 



sine, 



sinej.cosct)j -cose.cosij)^. sin<J)j. 
^sin0vsin<})j -cosO^sinjiJ) -cos<J).j 



(16) 



The positions of the units in the frame of unit l are, thus, 
given by: 



(1) 



= 1, 



10 



(17) 



15. 



Further define the matrix that converts from the frame 
of reference of unit l to the laboratory reference frame 



20 



where 



(18) 



25 



M 



0 -n, 

0 -J3, 



(19) 



and 



30 



35 



cosi|; 
sint|r 



\X X2.\ 

kllil 
I (X X i) 

Ullil 
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where x is the axis of the bond coming into unit 1. The 
matrix A is a rotation about £- and is defined so thaz 



10 



where 



15 



(10 0 
0 c -s 
10 s c, 



= ( 1 , A r 1 , _ A r , ) / ( A r A r / } 
= (-li,Ary*liyAr,)/(Ary'^Ar,^) 



(20) 



(21) 



Here A£ = -A [r/*^] (x ^-X if unit 0 is a C unit. Otherwise, 



20 The method proceeds by solving for ^^, 2 <i <. 6, 

analytically in terms of <f)^. Then a nonlinear equation is 
solved numerically to determine which values of <t>it if any, 
are possible for the chosen value of 0o- 

The derivation proceeds in the coordinate system of unit 

25 1, after it has been rotated by the chosen 0o. Define 



(22) 



If 63 0 and 65 i< 0, one can see from Fig. 14 that the 
distance between unit 3 and unit 5 is known and equal to 

2 . (l«cose,-l,ySine,+lj,)2 + 



(l4xSin84 ♦l^yCOsB^ +l5y ) 



(23) 



But this distance can also be written as 

35 
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(24) 



Equating these two results, two values of <t>, are possible 
4>i = arcsin(Cj) - arctan (x/jc,) - H (x^) 



V2 



•arcsin(c,) - arctanCx^/^p - h U,) , 



10 



with 



X>0 
X<0 



25 The constant Cj is given by_ 



20 



c, = 



~ -^^^ - -^i" * (cose,J3^ * sine;i3„) 
-2(sine,i3,-cos6,i3^)U/.x|)i/2 ' S3''o,e,.o 



25 



(i:,-i:,)-(x,-x,)/J6-i5--^4xCose,-x^(cose,.Z3^^sine,J,j 



(25) 



(26) 



(sine^jj^-cose^ijp {x;*x/)i/2 
■^3xCose,-x, (cose,J,,-^sine,J,..) 



L3^v.v^au,-^^tCOSO;J3^-^SinO;i3^) 

(sine2J3,-cose,J3^) ' *^^=°'^s = o 



(27) 



0 



where 2c is given by Eqn. 24 if 0. and x = irMl,^-*"] '^x, - 

^ £s)/l. if Gs = 0. Clearly for there to be a solution |cj < 1 
The last three equations for c, were determined by conditions 
similar to equating Eqns. 23 and 24. Fpr G, - 0, 0. the 



- 101 - 



wo 96/30849 



PCTA;S96/04229 



X component of £s"' - £3'" is known to be equal to (1,, + 
Ijcosej . For 63 0. 65 = 0, the x component of £5'*' - £,'«> is 
known to be equal to I5, + l.^cosB,. For 6, = 0, 6^ = 0, the 
angle between £3 - £j and £. - £5 is known to be equal to 6,. 
5 To determine ^3 two expressions for jr, - r^l* are again 

equated to determine that: 



= -^s -y'-i« ^2y^(cose3J,^-^sine3i,J 
2 (sine,J,,-cose,J,^) (y;*y|) ^'^ 



(28) 



10 



(29) 



15 



<t>f = arcsin(Cj) - arctan(yy/y,) - H (y,) 
^" = 7t-arcsin(Cj) - arctan (y^/y^) - h {y^) , 

where^ y = f71[r?i:-2^) -1^ . . Again. | c, | < 1 for there to be 
a solution. 

If 85 K 0, the value of can now be determined from: 

20 

X'j^' =x',^Ut,TjT3T,1 • (30) 



Defining 



a, = T\^T\'l-^ (t\"'] -1 (i^ - x^) . (31) 



25 

the equations that define <t>^ are given by 

g,y = cos(J)^ (sine^Jj, - cosO^Jj^) 
QiM = sin(J)^(sine,J5, - cose.is^) 

30 

This is a successful rotation if the position of £4 is 
successfully predicted. That is, the equation 



(32) 



x'/'-r'," = T,T,T3T,T5l^ = nK'^VHr^-^^) . (33) 
must be satisfied. Consider the x-component, which implies 
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(x,-i,)-(x,-x^)-j,i,cose,=o, 63*0, e,=o <34) 



must be satisfied if the rotation is successful. The 
equations for the case 6, « 0 clearly express the geometric 
conditions required for a successful rotation. 

Eqn. 34 is the nonlinear equation for <t>, because <f>,. 4>^, 
and 0, are determined by Eqns . {25), {29). and (32) in terms 
of (t>,. This equation has between zero and four values for 
each yalue^of^.^,.^. however, due to the multiple root character 
of Eqns. (25) and (29). The equation is solved by searching 
the region -ti < 0 < tt for zero crossings. The search is in 
increments of - 0.04 «. These roots are then refined by a 
bisection method. 

20 

The transformation from 0., 0 < i < 6 to the new solution 
which is constrained to change only r^, 1 < i < 4 actually 
implies a change in volume element in torsional angle space. 
This change in volume element is the reason for the 
appearance of the Jacobian in the acceptance probability. 
The Jacobian of this transformation is calculated in Dodd et 
al. (1993)at pp. 991-93. It is slightly different here since 
root position £5 is not necessarily the head position. The 
Jacobian is given by. 

30 , _ 1 

^ ' TditB] t35) 

where the 5 x 5 matrix B is given by B,, = [y,^ x (Xs - h,)]. for 

< 3 and B,, - [jj^ x (£, - £,)/|£, . £,|],.3 for i = 4.5. Here fc, 
«= except that is the head position even if 65 = 0, and' 
lii is the incoming bond vector for unit i. 
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Repeated application of the concerted rotation may lead 
to a slightly imperfect structure, due to numerical precision 
errors. In an alternative embodiment, peptide geometry would 
be restored to an ideal state by application of the Random 
5 Tweek algorithm after several thousand moves (Shenkin et al . , 
1987, Biopolymers 26:2053-85). 

The invention is further described in the following 

examples which are in no way intended to limit the scope of 

the invention, 

10 6. EXAMPLES 

6,1 . RELATION BETWEEN EFFECTIVENESS OF 
POTENTIAL DRUG IDENTIFICATIONS AND 
PHARMACOPHORE GEOMETRIC TOLERANCE 

Searches of a drug library well known to medicinal 

chemists, the Standard Drugs File (Derwent Publications Ltd.^ 

-London/ EngTandn ^rirustratl:^the~ geometric increase in the 

number of compounds found (and thus decrease in expected 

effectiveness of identification of potential drugs) as 

pharmacophore geometric tolerance is increased. Table 4 

tabulates the results. 

20 

Table 4 



25 



5HT3 (5 Hydroxytryptophan) 


Tolerance (A) 


Number of drug compounds 


2.0 


64 


1.0 


35 


0.5 


27 


0.25 


12 


0.10 


1 



35 
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Dopamine 


Tolerance (A) 


Number of drug compounds 


2.0 


188 


1.0 


185 


0.5 


60 * 


0.25 


48 


0.10 


5 



10 The pharmacophores are two well known neurotransmitters, 
5-hydroxytryptophan and dopamine. As the tolerance of one 
distance in the pharmacophore structure is decreased from 2.0 
to 0.1 A, the number of compounds retrieved from the database 
is listed. The advantage of achieving pharmacophore 

15 resolutioiv better .than ^approximately 0r25 A is clear. ^ 

If the tolerance of three distances were involved, the 
expected number of compound retrieved would be the cube of 
these numbers. For the dopaminergic pharmacophore, the 
number of lead compounds would decrease from over 6.5x10' to 

20 about 125 as three tolerances were decreased from 2.0 A to 
0.1 A. 

This example illustrates the geometric increase in the 
number of leads identified as pharmacophore geometry is less 
well defined. It thus a very preferred .aspect of this 
25 invention that the computational method results in 

determining pharmacophore structure accurate to at least 
approximately 0.25 to 0.30 A. Thus an exponentially large 
improvement in lead compound selection for drug design can be 
expected to result from this invention. 

30 

6.2. EXPRESSION AND PURIFICATION 
OF TARGET PROTEINS 

Target molecules that are proteins, for example ras, 
raf , vEGF and KDR. are expressed in the Pichia paetoris 
35 expression system (Invitrogen, San Diego, CA) and as 

glutathione-S-transf erase (GST) -fusion proteins in £. coli 
tGuan and Dixon, 1991, Anal. Biochem. 192:262-267). 
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affinity matrix and recovered by washing, while the GST tag 
remains bound to the matrix. Milligram quantities of 
recombinant protein per liter of E. coll culture are expected 
to be obtainable in this manner. 

5 

6.3. SYNTHESIS AND SCREENING OF POLYSOME-BASED 
LIBRARIES ENCODING RANDOM CONSTRAINED 
PEPTIDES OF VARTQUS LENGTHS 

6*3.1. PREPARAT ION OF DNA TEMPLATES 
DNA libraries with a high degree of complexity are made 
as two components: an expression unit, and a semi-random (or 
degenerate) unit. The expression unit has been synthesized 
chemically as an oligonucleotide (termed T7RBSATG) , and 
contains the promoter region for bacteriophage T7 RNA 
polymerase, a ribosome binding site, _a^^ ^ 
codon. The random region, also synthesized as an 
oligonucleotide (termed MMN6) contains a region complementary 
to the expression unit, the antisense version of the codons 
specifying Cys-X^-Cys, and a restriction site (BstXI) . The 
2Q library is constructed by annealing 100 pmol of 
oligonucleotide T7RBSATG (having the sequence 

5 ' ACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTTCCCTCCAGAAAT 
AATTTTGTTTAACTTTAACTTTAAGAAGGAGATATACATATGCAT3 ' 
(SEQ ID NO: 2)]; and oligonucleotide MNN6 [having the sequence 
25 5 ' CCCAGACCCGCCCCCAGCATTGTGGGTTCCAACGCCCTCTAGACA [MNN] .ACAATG 
TATATCTCCTTCTT3 ' (SEQ ID NO: 3) ; M - A or C , N = G, A, T, or 
C] , and extending the DNA in a reaction mixture containing 
10-100 units of Seguenase (United States Biochemical Corp., 
Cleveland, OH) , all four dNTPS (at 1 mM) , and 10 mM 
3Q dithiothreitol for 30 min at 37«»C. The extended material is 
then digested with BstXI, ethanol precipitated and 
resuspended in water. This fragment of DNA is then ligated 
via the BstXI end to a 250 base pair (bp) . PCR-amplif ied 
Glycine -Serine coding fragment derived from gene III of M13 
35 bacteriophage DNA. The gene III fragment has been amplified 
by use of two primers, respectively termed FGSPCR [having the 
sequence 5 ' TCGTCTGACCTGCCTCAACCTCCCCACAATGCTGGCGGCGGCTCTGGT3 ' 
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The cDNAs of these target proteins are cloned in the 
Pichia expression vectors pHIL-Sl and pPIC9 (Invitrogen) , 
Polymerase chain reaction (PGR) is used to introduce six 
Histidines at the carboxy- terminus of these proteins, so that 
5 this His- tag can be used to affinity-purify these proteins. 
The recombinant plasmids . are used to transform Pichia cells 
by the spheroplasting method or by electroporation. 
Expression of these proteins is inducible in Pichia in the 
presence of methanol. The cDNAs cloned in the pHIL-Sl 

10 plasmid are expressed as a fusion with the PHOl signal 

peptide and hence are secreted extracellularly , Similarly 
cDNAs cloned in the pPIC9 plasmid are expressed as a fusion 
with the a- factor signal peptide and hence are secreted 
extracellularly , Thus, the purification of these proteins is 

15 simpler as it_ merely _involyes^ afjinity^puraf ication= f rom-^t^^^ 
growth media. Purification is further facilitated by the 
fact that Pichia secretes very low levels of homologous 
proteins and hence the heterologous protein comprises the 
vast majority of the protein in the medium. The expressed 

20 proteins are affinity purified onto an affinity matrix 

containing nickel . The bound proteins are then eluted with 
either EDTA or imidazole and are further concentrated by the 
use of centrifugal concentrators. 

As an alternative to the Pichia expression system, the, 

25 target proteins are expressed as glutathione-S-transf erase 
(GST) fusion proteins in E, coli. The target protein cDNAs 
are cloned into the pGEX-KG vector (Guan and Dixon, 1991, 
Anal. Biochem. 192:262-267) in which the protein of interest 
is expressed as a C- terminus fusion with the GST protein. 

30 The pGEX-KG plasmid has an engineered thrombin cleavage site 
at the fusion junction that is used to cleave the target 
protein from the GST tag. Expression is inducible in the 
presence of IPTG, since the GST gene is under the influence 
of the tac promoter. Induced cells are broken up by 

35 Bonication and the GST- fusion protein is affinity purified 
onto a glutathione -linked affinity matrix. The bound 
protein is then cleaved by the addition of thrombin to the 
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(SEQ ID NO: 4)), and RGSPCR (having the sequence 
5 ' ATCAAGTTTGCCTTTACCAGCATTGTGGAGCGCGTTTTCATC3 ' 
(SEQ ID NO: 5)] , and Taq DNA polymerase (Gibco-BRL) . The 
amplified DNA (250 bp) was cut with BstXI to yield a 200 bp 
5 fragment that has been gel purified. The 200 bp fragment is 
then ligated to the random peptide coding DNA fragment- This 
DNA specifies the synthesis of a peptide of the sequence Met- 
His-Cys- (X) e-Cys- (SEQ ID N0:6) fused to the Gly-Ser rich 
region of the M13 gene III protein. The Gly-Ser rich domain 

10 is thought to behave as a flexible linker and assist in 

presentation of the random peptide to the target molecules. 

To make constrained random peptides of different 
lengths, oligonucleotides are made that are similar to MNN6, 
except that the degenerate region is 5, 1, 8, and 9 codons 

15 long. -In addition, oligonucleotides are made that code for 
various shapes of constrained random peptides by specifying 
sequences comprising three cysteine residues interspersed 
between 6-10 randomly specified amino acids. 

20 6.3.2. IN VITRO SYNTHESIS AND 

ISOLATION OF POLYSOMES 

An E. coll S30 extract is prepared from the B strain 

SL119 (Promega) . Coupled transcription-translation reactions 

are performed by mixing the S30 extract with the S30 premix 

2g (containing all 20 amino acids) , the linear DNA template 
coding for peptides of random sequences (prepared as 
described in Section 6.3.1 above), and rifampicin at 20 
Mg/ml. The reaction is initiated by the addition of 100 
units of T7 RNA polymerase and continues at 37®C for 30 min. 
The reaction is terminated by placing the reactions on ice 
and diluting them 4 -fold with polysome buffer (20 raM Hepes- 
NaOH, pH 7.5, 10 mM MgClj, 1.5 iiq/xal chloramphenicol, 100 
/ig/ml acetylated bovine serum albumin, 1 mM dithiothreitol , 
20 units/ml RNasin, and 0,1% Triton X-lOO) . Polysomes are 

jg isolated from a 50 fil reaction programmed with 0.5-1 /xg of 
linear DNA template specifying the synthesis of random 
constrained peptides. To isolate polysomes, the diluted S30 
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reaction mixtures are centrifuged at 288,000 X g for 30-40 
min at 4**C. The pellets are suspended in polysome buffer and 
centrifuged a second time at 10,000 X g for 5 min to remove 
insoluble material. 

6.3.3- AFFINITY SELECTION /SCREENING OF POLYSOtlES 
The isolated polysomes are incubated in microtiter wells 
coated with the target proteins. Microtiter wells are 
uniformly coated with 1-5 of 6-His tagged, or glutathione 
S- transferase fused, target proteins (see Section 6.2 
hereinabove) . Target proteins that are used include the 
oncoproteins ras and raf , KDR (the vascular endothelial 
growth factor [vEGF] receptor protein) and vEGF. The 
microtiter wells are coated with 1-5 /ig of these^ target , _ 
^^_proteins. by =incubation^iii PBS^ (phosphate-buffered saline; 10 
mM sodium phosphate, pH 7.4, 14 0 mM NaCl, 2.7 mM KCl) , for 1- 
5 hours at 37**C. The wells are then washed with PBS, and the 
unbound surfaces of the wells blocked by incubation with PBS 
containing 1% nonfat milk for 1 hr at 37**C. Following a wash 
with polysome buffer, each well is incubated with polysomes 
isolated from a single 50 /il reaction for 2-24 hr at 4*C. 
Each well is washed five times with polysome buffer and the 
associated mRNA is eluted with polysome buffer containing 20 
mK EDTA. 

After affinity selection of the polysomes, the 
associated mRNAs are isolated, and treated with 5-10 units of 
DNase I (RNase-free; Ambion) for 15 min at 37®C after 
addition of MgClj to 40 mM* The mRNA is phenol-extracted and 
ethanol -precipitated and dissolved in 20 fil of RNase-free 
water. A portion of the mRNA is used for cDNA preparation 
and subsequent amplification using 15 pmol each of primers 
RGSPCR [S • ATCAAGTTTGCCTTTACCAGCATTGTGGAGCGCGTTTTCATC3 ' 
(SEQ ID NO: 5)), and SELEXFl 

15 'ACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTTCC3 ' 
(SEQ ID NO: 9)] and rTth Reverse Transcriptase RNA PGR kit 
(Perkin Elmer Cetus) . Specifically, the mRNA is re verse - 
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transcribed into cDNA in a 20 til reaction containing 1 pg 
tnRNA, 15 pmol of RGSPCR primer, 200 each of dGTP, dATP, 
dTTP, and dCTP, 1 mM MnCl,. 10 tnM Tris-HCi, pH 8.3, 90 mM KCl, 
and 5 units of rTth DNA polymerase at 70 "C for 15 min. In 
5 the next step, the cDNA is amplified by the addition of 2 . 5 
mM MgClj, 8% glycerol. 80 mM Tris-HCl, pH 8.3, 125' mM KCl, 
0.95 mM EGTA, 0.6% Tween 20, and 15 pmol of the SELEXFl 
primer. The reaction conditions that are employed are 2 min 
at 95»C for one cycle, 1 min at 95»C and 1 min at 60'C for 35 

10 cycles, and 7 min at 60°C for one cycle. The amplified 
product is then gel -purified and quantitated by 
spectrophotometry at 260 nm. A portion of the amplified DNA 
is digested with Nsil and Xbal and the resulting 30 base pair 
fragment is directionally cloned into a monovalent phage 

1-5-di-splay-vector The DNAs= inserted in" the' monovarent -phage 
display vector are then sequenced to determine the identity 
of the peptides that were selectively retained by one cycle 
of affinity binding to the target protein. A second portion 
(0.5-1 fig) of the amplified DNA is subjected to another cycle 

20 of affinity selection, mRNA isolation, cDNA amplification, 
and cloning. 

6.4. PHAGEMID SCREENING 
Three different protocols for screening of a phagemid 
25 library are presented in the subsections hereinbelow. These 
protocols, particularly the immobilization and binding steps, 
are readily adaptable to use for screening of different 
libraries, e.g., polysome libraries. Preferably, different 
methods are used in different rounds of screening. 



6.4.1. PIATE. PROTOCOL 
In this example, a protocol is presented for screening a 
phagemid library, in which in the first round of screening, a 
35 biotinylated target protein is immobilized (by the specific 
binding between biotin and streptavidin) on a streptavidin 
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coated plate. The immobilized target protein is then 
contacted with library members to select binders. 

Reagents Us d: 

5 Purified target protein, microfuge tubes. Falcon 2059, 
Binding Buffer, Wash Buffer, Elute Buffer, phage display 
Library of >10» pfu/Screened Target, fresh overnight culture 
of appropriate host cells, LB Agar plates with antibiotics a 
needed, biotinylating agent NHS-LC-Biotin (Pierce Cat. 
10 #21335), streptavidin, 50 mM NaHCO, pH 8.5, 1 M Tris pH 9.1. 
M280 Sheep anti-mouse IgG coated Dynabeads (Dynal) , phospha't« 
buffered saline (PBS), Falcon 1008 petri dishes. 

Wash Buffer = ix PBS (Sigma Tablets) . 1 mM MgCl,, 1 mM CaCl, 
15 0. 05% Tween 20; .JFor one liter: ..5 PBS -tablets, i mt 1 ^M MgCl, 
1 ml 1 M CaCl,, 0.5ml Tween 20. nanopure K,0 to 1 liter). 

Binding Buffer = Wash Buffer with 5 mg/ml bovine serum 
albumin (BSA) . 

20 

Elute Buffer = 0 . 1 N HCl adjusted to pH 2.2 with glycine: 
1 mg/ml BSA. 

Procedure : 
25 Protein Biotinylation: 

1. Wash 50-100 ^lg of target protein in 50 mM NaHCO, pH 8.5 
in a Centricon (Amicon) of the appropriate molecular weight 
cut-off. 

2. Bring the total volume to 100 /xl with 50 mM NaHCO, pH 
30 8.5. ^ ^ 

3. Dissolve 1 mg of NHS-LC-Biotin in 1 ml H,0. Do not store 
this solution. 

4. Immediately add 37 ^1 of the NHS-LC-Biotin solution to 
35 (RT)^^''^^'' protein and incubate for l hr at room temperature 
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5 . Remove the unreacted biotin by washing 2X PBS in a 
Centricon (Amicon) of the appropriate molecular weight 
cutoff. Store the biotinylated protein at 4**C. 

5 Coating a 1008 Plate with Streptavidin: 

6. The night before the binding experiment precoat a 1008 
plate with streptavidin. 

7. Add 10 fig of streptavidin (1 mg/ml H^O) per 1 ml of 50 mM 
NaHCOj pH 8.5. 

10 8. Add 1 ml of this solution to each plate and place in a 
humidified chamber overnight at 4®C. 

Prebinding; Blocking Non-Spe.cif ic Sites s 

9. To a streptavidin coated plate add 400 //I of Binding 
15 =Bu&fer -=(BSA-blocking) for one hour room "tempirat ure . " 

10. Rinse wells six times with Wash Buffer by slapping dry 
on a clean piece of labmat . 

Binding; Specific Target/Phage Complexes Rovmd 1: 
20 11. Add 10 ng of biotinylated target protein in 400 fil of 
Binding Buffer to the well and incubate for 2 hr at 4«C. 

12. Add 4 //I of 10 mM biotin and swirl for 1 hr at 4*'C. 

13. Wash as in step 10. 

14. Add concentrated phage library (>10^^ pfu) in 4 00 fil of 
25 Binding Buffer and swirl overnight at 4^C. 

Washing and Elution: 

15. Slap out binding mixture and wash as in step 10. 

16. To elute bound phage add 400 ^1 of Elution Buffer and 
30 rock at RT for 15 min. 

17. Transfer the elution solution to a sterile 1.5 ml tube 
which contains 75 ^1 of i M Tris pH 9.1. Vortex briefly, 

A2i^>li£lcation of Round 1 Eluted Phage: 

35 18. Plate all of the eluted round 1 phage by adding 157 ^1 
of phage to 200 til of cells incubated overnight (previously 
checked free of contamination) in thr e aliquots. Incubate 
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25 min in a 37 0C water bath and then spread onto LB 
agar/antibiotics plate containing 2% glucose. 
19. Scrape plates with 5 ml of 2XYT (growth broth)/ 
Antibiotics/Glucose and leave swirling for 3 0 min at RT. 
5 20. Add the appropriate amount of 2XYT/Antibiotics/Glucose 
to bring the O.D. 600 down to 0.4 and then grow at 37°C at 
250 rpm until the O.D. 600 reaches 0.8. 

21. Remove 5 ml and add to it 1.25 x lO^" M13 helper phage. 

22. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 
10 370C. 

23. Centrifuge 10 min at 3000 X g at RT. 

24. Resuspend cells in 5 ml 2XYT with no glucose. (This step 
removes glucose) . 

25. Centrifuge as in step 23 and resuspend in 5 ml 2XYT with 
15 kanamycin and the appropriate antibiotics (no. glucose) . Spin 

"18 hr at 37«'C and 250 rpm. 

26. Pellet cells at 10,000 X g and sterile filter the phage 
containing supernatant which is now ready for round 2 
screening. 

20 27. Titer the round 1 eluted phage stocks. 

Binding; Specific Target/Phage CoaplexeB Rounds 2-5: 

6. Combine -1 fig of biotinylated target protein with the 
eluted and titered round 1 phage (10» pfu) in 200 nl of 

25 Binding Buffer and rock 4 hr at 4«»C. 

7. The night before the round 2 screening is started, 
prewash 200 ^l/target protein to be screened of sheep anti- 
mouse igG magnetic beads (M280 IgG Dynabeads) with 2X i ml of 
Wash Buffer using the Dynal Magnet. Let the beads collect at 

30 least l min before removing the buffer. Let the beads stand 
15 sec to allow residual Binding Buffer to collect and remove 
with a P200 Pipetraan. 

8. Resuspend the washed beads in 200 ^1 of Binding Buffer 
and add 100 nl of mouse anti-biotin IgG. (Jackson IRL) . Rock 

35 overnight at 4«C. 

10. Wash the unbound anti-biotin IgG from the Dynabeads by 
placing them on the Dyna magnet for at least 1 min and remove 
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all liquid as in Step 7. Remove the tube from the magnet and 
resuspend the beads in 1 ml of Wash Buffer, rock at 4°C for 
30 min, and return to the magnet. Again let the beads pellet 
for 1 min; repeat this process 3 more tim s and resuspend the 
5 beads in 400 nl of Binding Buffer. 

10a. The coated beads are now ready for use 

(100 Ml/round/target protein) . The remainder can be stored 

for use for up to 2 weeks. 

11. Add the 100 fil of anti-biotin coated Dynabeads (Step 10) 
10 to the protein/phage fraction (Step 9) bringing the total 

binding volume to 300 fil and rock for 2 hr at 4«»C. Ensure 
that the beads mix thoroughly with the phage/protein 
solution. 

15 Washing and Elution: _ . _ _ „ ^ _ ^ 

12. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

13. Remove the solution using a PIOOO Pipetman and discard. 
Let the beads stand 15 sec to allow residual binding buffer 

20 to collect and remove with a P200 Pipetman. Note serial 
dilution depends upon all residual liquid being removed 
(i.e., 5 nl into 500 is lOOX washing; 50 fil into 500 is only 
lOX) . 

14. Remove the tube from the magnet and resuspend the beads 
25 in 750 m1 of Wash Buffer and return to the magnet. Again let 

the beads pellet by waiting l min. 

15. Remove the Wash solution as in Step 7 and repeat this 
process several more times. 

16. After the removal of the final wash, resuspend the beads 
30 and transfer them to a fresh, labeled tube and wash once 

more . 

17. To elute bound phage, add 400 /xl of Elution Buffer, 
titrate and rock for 14 min at RT. 

18. Place the tube on the magnet for one minute and transfer 
35 the eluate to a sterile 1.5 ml tube which contains 75 ^1 of 

1 M Tris pH 9.1. Vortex briefly. 
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Amplification of Round 2-5 Eluted Phage: 

15a. Plate 10 /il and 100 nl of round 2.3.4 eluates using 
200 nl of contamination free (previously tested) E. coli 
XLlBlue cells onto each plate containing 

5 tetracycline/ampicillin/glucose and tetracycline/ampicillin 
and amplify as in Steps 17-25. 

BIOTIN-ANTIBIOTIN Tr rG BEAD PROTf^rnT. 

In this example, a protocol is presented for screening a 
10 phagemid library, in which a biotinylated target protein is 
immobilized (by the specific binding between anti-biotin 
antibodies and biotin) on a magnetic bead containing anti- 
biotin antibodies on the bead surface. The immobilized 
target protein is then eontacced with library members to 
15 select binders. 

Reagents Used: 

M280 Sheep anti -Mouse IgG coated Dynabeads (Dynal) 



20 Binding; Specific Target/Phage Complexes Round 1: 

6. Combine 10 of biotinylated target protein with the 
phage library (>io» pfu) in 400 ^1 of Binding Buffer and rock 
overnight at 4*»C. 

7. That same night prewash 50 ^1 sheep anti-mouse IgG 

25 magnetic beads (M280 IgG Dynabeads) with 500 >xl of Binding 
Buffer twice using the Dynal Magnet. Let the beads collect 
at least 1 min before removing the buffer. Let the beads 
stand 15 sec to allow residual binding buffer to collect and 
remove with a P200 Pipetman. 

30 8. Resuspend the washed beads in lOO /zl of Binding Buffer 
and add 33 ^1 of mouse anti-biotin IgG (40 /ig, Jackson IRL) . 
Rock overnight at 4^c. 

9. Remove unbound protein from the phage/protein reaction 
in Step 6 with a Microcon 100. Spin at 800 X.g until 
35 exclusion volume is met and wash twice with Wash Buffer 

(again at 800 X g) . Collect phage/protein with a Pipetman and 
add an additional 50 ^1 of Wash Buffer to the Microcon. 
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gently titrate and combine with first fraction to ensure 
maxima 1 recovery . 

10. Wash the unbound anti-biotin IgG from the Dynabeads by 
placing them on the Dyna magnet for at least 1 min and remove 
5 all liquid as in Step 7. Remove the tube from the magnet and 
resuspend the beads in 750 /xl of Wash Buffer, rock at 4<»C for 
30 min, and return to the magnet. Again, let the beads 
pellet for 1 min; repeat this process 3 more times and 
resuspend the beads in 100 /il of Binding Buffer. 

10 11. Add the anti-biotin coated Dynabeads (Step 10) to the 
protein/phage fraction (Step 9), bring the total binding 
volume to 500 ^il with Binding Buffer, and rock for 2 hr at 
RT. Ensure that the beads mix thoroughly with the 
phage/protein solution. 

15 ^ . ^ 

Washing and Elution: 

12. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

13. Remove the solution using a PIOOO Pipetman and discard. 
20 Let the beads stand 15 sec to allow residual binding buffer 

to collect and remove with a P200 Pipetman. Note that serial 
dilution depends upon all residual liquid being removed 
(i.e., 5 /il into 500 is lOOX washing; 50 ;zl into 500 is only 
lOX) . 

25 14. Remove the tube from the magnet and resuspend the beads 
in 750 ^1 of Wash Buffer and return to the magnet. Again let 
the beads pellet by waiting 1 min. 

15. Remove the wash solution as in Step 7 and repeat this 
process 3 more times. 
30 16. After the removal of the fourth wash, resuspend the 
beads and transfer them to a fresh, labeled tube and wash 
once more. 

17, To elute bound phage, add 400 /il of Elution Buffer, 
titrate and rock for 14 min at RT, 
35 18. Place the tube on the magnet for one minute and transfer 
the eluate to a sterile 1.5 ml tube which contains 75 fil of 
1 M Tris pH 9.1. Vortex briefly. 
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Amplification of Rouind 1 Eluted Phage: 

17. Plate all of the eluted round i phage by adding 157 
of phage to 200 ml of cells incubated overnight (previously 
checked to be free of contamination) in three aliguots. 
5 Incubate 25 min in a 37»C water bath and then spread onto LB 
agar/antibiotics plate containing 2% glucose. Place plates 
upright in 21" C incubator until dry and then invert and 
incubate overnight . 

IB. Scrape plates with 5 ml of 2XYT/Antibiotics/Glucose and 
10 leave swirling for 30 min at RT. 

19. Add the appropriate amount of 2XYT/Antibiotics/Glucose 
to bring the CD. 600 down to 0.4 and then grow at 37«>C at 
250 rpm until the O.D. 600 reaches 0.8. 

20. Remove 5 ml and add to it 1.25 x 10" M13 helper phage. 

h^ Jh. ^- Shake . 3 0_min_-at. 150_rpm-and then -30=min"at' 250 "rpm' at ~ 
37»C. 

22. Centrifuge 10 min at 3000 X g at RT. 

23. Resuspend cells in 5 ml 2XYT with no glucose. (This step 
removes glucose) 

20 24. Centrifuge as in step 23 and resuspend in 5 ml 2XYT with 
kanamycin and the appropriate antibiotics (no glucose) . Spin 
18 hr at 37«>C and 250 rpm. 

25. Pellet cells at 10,000 xg and sterile filter the phage- 
containing supernatant which is now ready for round 2 
25 screening. 

Binding; Specific Target/Phage Con^jlexes Round 2, 3, & 4: 

6a. Bind 1 ng of target protein with 100 ^1 of amplified 
phage from the previous round as before, overnight at 4«'C. 
30 7a. Prepare the IgG anti biotin/anti IgG beads as in Steps 
7-10 using, however, only 20 til of sheep anti-mouse IgG and 
13 ;xl of anti-biotin IgG. 

8a. All other binding procedures are identical with Steps 6- 
11. 

35 
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Washing and Elution: 

9a. Place the binding reaction into the Dynal magnet and let 
sit for 1 min, 

10a. Remove the solution and discard using a PIOOO Pipetman. 
5 Let the beads stand 30 sec to allow residual Binding Buffer 
to collect and remove with a P200 Pipetman. 

11a. Remove the tube from the magnet and resuspend the beads 
in 750 fil of Wash Buffer and return to the magnet. Again let 
the beads pellet by waiting 1 min. 
10 12a. Remove the wash solution as in Step 11a and repeat this 
process 3 more times. 

13a. After the removal of the fourth wash, resuspend the 
beads and transfer them to a fresh, labeled tube and wash 4 
more times. 

^15 14a„._ Elute and neutralize as in Step 15 r ^ " ^ ^ ^ ^ 

Amplification of Roiinds 2, 2, & 4 Eluted Phage: 

15a. Plate 10 ^il and 100 fil of round 2,3,4 eluates and 
amplify as in Steps 17-25. 

20 

6 • 4 . 3 . BIOTIN-STREPTAVIDIN, MAGNETIC 
BEAD PROTOCOLS 

In this example, a protocol is presented for screening a 

phagemid library, in which a biotinylated target protein is 

25 immobilized (by the specific binding between biotin and 

streptavidin) on a streptavidin coated magnetic bead. The 

immobilized target protein is then contacted with library 

members to select binders. 

Reagents Ueed: 

Purified target protein, M280 streptavidin coated Dynabeads 
(Dynal) 

Binding; Specific Target/Phage Conplexes Round 1: 

35 6. Combine 10 /ig of biotinylated target protein with the 
phage library (>10" pfu) in 400 /il of Binding Buffer and rock 
overnight at 4**^ 
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7. Remove unbound protein with a Microcon 100. Spin at 
800 X g until exclusion volume is met, and wash twice with 
Wash Buffer (again at 800 X g) . Collect phage/protein with a 
Pipetman and add an addition 50 ^1 of Wash Buffer to the 

5 Microcon, gently titrate and combine with the first fraction 
to ensure maximal recovery. 

8. Prewash 50 ^1 (per reaction) of streptavidin magnetic 
beads (M280 streptavidin Dynabeads) twice with 500 /il of 
Washing Buffer using the Dynal magnet. 

0 9. Add the prewashed Dynabeads to the protein/ghage fraction 
(add Binding Buffer to a total of 500 /xD and rock for 30 min. 
Ensure that the beads mix thoroughly with the phage/protein 
solution . 



15 Washing and Elution: 

10 Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

11. Remove the solution using a PIOOO Pipetman and discard. 
Let the beads stand 15 sec to allow residual Binding Buffer to 

20 collect and remove with a P200 Pipetman. Note that serial 
dilution depends upon all residual liquid being removed (i.e., 
5 /il into 500 is lOOX washing; 50 ^1 into 500 is only lOX) , 

12 . Remove the tube from the magnet and resuspend the beads 
in 750 ^1 of Wash Buffer and return to the magnet. Again let 

25 the beads pellet by waiting 1 min. 

13 . Remove the wash solution as in step 11 and repeat this 
process 3 more times. 

14. After the removal of the fourth wash, resuspend the beads 
and transfer them to a fresh, labeled tube and wash once more. 
30 15. To elute bound phage add 400 /xl of Elution Buffer, 
titrate and rock for 14 min at RT. 

16. Place the tube on the magnet for one minute and transfer 
the eluate to a sterile 1.5 ml tube which contains 75 fil of 
1 M Tris pH 9.1. Vortex briefly. 
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Amplification of Round 1 Eluted Phag : 

17. Plate all of the eluted round 1 phage by adding 157 /il of 
phage to 200 ^1 of overnight cells (previously checked to be 
•free of contamination) in three aliquots. Incubate 25 min in 

5 a 37*»C water bath and then spread onto LB agar/antibiotics 
plate containing 2% glucose. Place plates upright 'in 37*0 
incubator until dry and then invert and incubate overnight. 

18. Scrape plates with 5 m1 of 2XYT/Antibiotics/Glucose and 
leave swirling for 3 0 min at RT. 

10 19. Add the appropriate amount of 2XyT /Antibiotics/Glucose 
to bring the O.D. €00 down to 0.4 and then grow at 31^C at 250 
rpm until the O.D. 600 reaches 0.8. 

20. Remove 5 ml and add to it 1.25 x 10" M13 helper phage. 

21. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 
1£37^C^ _ _ _ _ _ _ _ . . . -^^ 

22. Centrifuge 10 min at 3000 X g at RT. 

23. Resuspend cells in 5 ^1 2XyT with no glucose. (This step 
removes glucose) . 

24. Centrifuge as in step 22 and resuspend in 5 ml 2XyT with 
20 hanamycin and the appropriate antibiotics (no glucose) . Shake 

18 hr at 37«>C and 250 rpm. 

25. Pellet cells at 10,000 X g and sterile filter the phage 
containing supernatant which is now ready for round 2 
screening . 

25 

Binding; Specific Target/Phage Con^jlexes Roiind 2, 3, & 4: 

6a. Combine 1 /xg of biotinylated target protein with 100 /il 
of the previous round's phage (>10' pfu) in 400 fil of Binding 
Buffer and rock overnight at 4«C. 

30 7a. Remove unbound protein with a Microcon 100. Spin at 

800 X g until exclusion volume is met and wash twice with Wash 
Buffer (again at 800 X g) . Collect phage/protein with a 
Pipetman and add an addition 50 fil of Wash Buffer to the 
Microcon, gently titrate and combine with the first fraction 

35 to ensure maximal recovery. 
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8a. Prewash 20 fil (per reaction) of streptavidin magnecic 
beads (M280 streptavidin Dynabeads) twice with 500 fil of 
Washing Buffer using the Dynal magnet. 

9a. Add the prewashed Dynabeads to the protein/phage fraction 
5 and rock for 30 min. Add Binding Buffer to a total of 500 fil . 
Ensure that the beads mix thoroughly with the phage/protein 
solution. 

Washing and Elution: 
10 10a. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

11a. Remove the solution and discard using a PlOOO Pipetman. 
Let the beads stand 30 sec to allow residual Binding Buffer to 
collect and remove with a P2GG Pipetman. 
15 12a. Remove the tut»e from the magnet and resuspend the^e.ads 
'^""■^50 -/11- of Wash Buffer and return to the magnet . Again let 
the beads pellet by waiting 1 min. 

13a. Remove the wash solution as in Step lla and repeat this 
process 3 more times. 
20 14a. Aft«r the removal of the fourth wash resuspend the beads 
and transfer them to a fresh, labeled tube and wash 4 more 
times . 

15a. Elute and neutralize as in Step 15. 

25 Amplification of Rouads 2, 3, & 4 Eluted Phage: 

16a. Plate 10 (il and 100 fil of round 2.3,4 eluates and amplify 
as in Steps 17-25. 

6.5. AFFINITY MEASUREMENTS OF 
30 PEPTIDE-TARGET PROTR TN TMTgRACTy^ l^p 

Once peptides that bind to a target protein have been 

identified, the affinities of these peptides to their 

respective targets are measured by measuring the dissociation 

constants (K^) of each of these peptides to their respective 

35 targets. Oligonucleotides that encode the peptides are 

constructed so as to encode also an epitope tag fused to the 

peptide (for example, the myc pitope) that can be detected by 
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a commercially available antibody. These oligonucleotides are 
incubated with polysome extracts to produce the peptide tagged 
with the epitope. Binding of the target protein to the 
peptide is done in solution, and separation of the bound 
5 peptide from the unbound peptide is done by immunoaf f inity 
purification using an anti -target protein antibody. This 
immunoaf f inity purification is done by a modified ELISA 
(enzyme -linked immunosorbent assay) protocol, in which the 
target protein-peptide mixture is exposed to the anti-target 

10 protein antibody immobilized on a solid support such as a 
nitrocellulose membrane, and the unbound peptide is then 
washed off. In this protocol, the concentration of the target 
protein is varied and then the amount of bound peptide is 
estimated by detecting the epitope tag on the peptide by use 

13 of _anti -epitope .ant ibody=.. . In this^ manner the= af tin! ty of^ ^ ^ 
each peptide for its target protein can be determined. 

6.6. REDOR MEASUREMENTS ON A CX ^ C PEPTIDE RESIN 

This example demonstrates successful synthesis and 
20 cyclization of a CX^C peptide resin of greater than 95% purity 
and with a labeled glycine followed by successful REDOR 
distance measurements on the CX^C peptide resin using the 
preferred REDOR methods of this invention. The labeled 
peptide used was 

25 Cys-Asn-Thr-Leu-Lys- (^*N-2-^^C)Gly-Afip-Cys-Gly-mBHA resin, where 
a glycine linker attached the peptide of interest to the nBHA 
resin. (Cys-Asn-Thr-Leu-Lys-Gly-Asp-Cys-Gly « SEQ ID NO: 10) 

The peptide resin was synthesized by solid phase 
synthesis on p-MethylBenzhydrilamine (mBHA) resin using a 

30 combination of Boc and Fmoc chemistry. MethylBenzhydrilamine 
resin (Sxibst. 0.36 meg/g) was purchased from Advanced Chem 
Tech (Louisville, KY) . Fmoc ("N-2--"C) Gly was prepared from 
HCl, ("N-2-"C)Gly (Isotec Inc., Miaraisburg, OH) and Fmoc-OSu. 
Boc-Gly, (Trt) , Fmoc -Asp (OtBu) , Pmoc-Lys (Boc) , Fmoc-Leu, 

35 Fmoc-Thr (OtBu) , Fmoc-Asn and Boc-Cyfi(Acm) were purchased from 
Bachem (Torrance, CA) . Reagent grade solv nts were purchased 
from Fisher Scientific, Diisopropylcarbodiimide (DIG) , 
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Trif luoroacetic acid (TFA) and Diisopropylethylamine (DIEA) 
were purchased from Chem Impex (Wooddale, IL) . Nitrogen, HF 
were purchased from Air Products (San Diego, CA) . 
The first step 43 was the synthesis of 
5 Boc-Cys (ACM) -Asn-Thr (OtBu) -Leu-Lys (Hoc) -Gly-Asp (OtBu) - 
Cys (Trt) -Gly-mBHA resin., l.llg (0.40 meq) of mBHA resin were 
placed in a 150 ml reaction vessel (glass filter at the 
bottom) with Methylene Chloride (CHjClj) fDCM"] and stirred 15 
min with a gentle bubbling of Nitrogen in order to swell the 
10 resin. The solvent was drained and the resin was neutralized 
with DIEA 5% in DCM (3X2 min) . After washes with DCM, the 
resin was coupled 60 min with Boc-Gly (0.280 g-l.6 meq-4 fold 
excess-O.lM) and DIC (0.25 ml-1.6 meq-4 fold excess-O.lM) in 
DCM. Completion of the coupling was checked with the 
^ E^^^y4£% test^ _.After_washes,^-the- resin -was=st-irred- 30' mrn Th" 
TFA 55% in DCM in order to remove the Boc protecting group. 
The resin was then neutralized with DIEA 5% in DCM and coupled 
with Fmoc-Cys(Trt) (0.937g-l.6 meq-4 fold excess-O.lM) and DIC 
(0.25 ml-1.6 raeq-4 fold excess-O.lM) in DCM/DMF (50/50). 
20 After washes the resin was stirred with Piperidine 20% in DMF 
(5 min and 20 min) in order to remove the Fmoc group. After 
washes, this same cycle was repeated with Fmoc -Asp (OtBu) , 
Fmoc("N-2-"C)Gly (2 fold excess only), Fmoc -Lys (Boc) , Fmoc- 
Leu, Fmoc-Thr (OtBu) , Fmoc-Asn and Boc-Cys (Acm) . After the 
25 last coupling, the Boc group was left on the peptide. The 
resin was washed thoroughly with DCM and dried under a 
nitrogen stream. Yield was 1.49g (Expected: -1.7g). 

The next step 44 was cyclization of the 
Boc -Cys -Asn-Thr (OtBu) -Leu-Lys (Boc) -Gly-Asp (OtBu) -Cys-Gly-mBHA 
30 resin. 600 rog of protected peptide resin were sealed in a 
polypropylene mesh packet. The bag was shaken in a mixture of 
solvent (DCM/Methanol/Water-64 0/280/47) in order to swell the 
resin. The bag was then shaken 20 min in 100 ml of a solution 
of iodine in the same mixture of solvent (0.4 mg I^/ml solvent 
35 mixture) . This operation was performed 4 times. No 

decoloration was observed after the third time. The resin was 
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then thoroughly washed with DCM, DMF, DCM, and methanol 
successively. 

The last step 45 was side-chain deprotection of the 
Cys-Asn-Thr-Leu-Lys-Gly-Asp-Cys-Gly-mBHA resin . After 
5 cyclization the resin in the polypropylene bag was reacted 1.5 
hour with 100 ml of a mixture TFA/p-Cresol -Water f 95/2 . 5/2 • 5) . 
After washes with DCM and Methanol, the resin was dried 46 
hours under vacuum. Yield was 560 mg. 

The resulting peptide resin was analyzed for its purity 

10 and the presence of the disulfide bridge. 40 mg of resin were 
sealed in a propylene mesh packet and treated with HF at 0 C 
for 1 hour in presence of anisole (HF/Anisole: 90/10). The 
scavenger and by-products were extracted from the resin with 
cold ethyl ether. The peptide was extracted with 10% Acetic 

15 Acid and lyophilized 36 hours. The dry isolated peptide was 
characterized by PDMS (mass spectrography) and HPLC (high 
performance liquid chromatography) . This analysis 
demonstrated that greater than 95% of the product peptide was 
of the correct amino acid composition, having a disulfide loop 

20 and without inter-molecular disulfide dimers, 

REDOR measurements were made on the peptide resin 
prepared by this method, and as a control, also on dried 
("N-2-"C) labeled glycine. The preferred REDOR methods and 
parameters, as previously detailed, were used. Fig. 6 

25 illustrates the "N resonance spectral signals obtained. 
Signal 70 is the signal produced by dried glycine after no 
rotor periods. Signals 71, 72, 73 are glycine signals after 
2, 4, and 8 rotor periods, respectively. Signals 74, 75, 76, 
and 77 are the peptide resin signals after 0, 2, 4, and 8 

30 rotor periods, respectively. 

Fig. 7 illustrates the data analysis. As in Fig. 5, axis 
81 is the AS/S axis, and axis 82 is the X axis. The varisibles 
are as used in equation 5, Graph 83 is defined by equation 5, 
and is the initial rising part of the full curve shown in Fig. 

35 5. Data points 84, 85, 86, and 87 are best fits of the data 
for 0, 2, 4, and 8 rotor periods, respectively. At these 
points, the circles represent the glycine valu s and the 
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squares the peptide resin values. These values correspond to 
a C-N distance in glycine and the peptide of 1,55 A (and a D^^. 
of 800 Hz) . Repeated measurements gave a C-N distance of 
1.50 A (and a D^k of 875 Hz) . The accepted distance in glycine 
5 is 1.4 8 A. The above procedure was repeated for ("N-l-^^C) 
labeled glycine in 

Cys-Asn-Thr-Leu-Lys- ("N-l-"C)Gly-Asp-Cys.Gly-mBHA resin, and 
the measured C-N distance of 2.50 A is in excellent agreement 
with the predicted value of 2.46 A. 

10 Thus REDOR accuracy to better that 0.1 A is demonstrated. 

Also demonstrated is the peptide resin as an appropriate 
substrate for NMR measurements. Inter-molecular dipole-dipole 
interactions between adjacent peptides did not interfere. 
Also the overlap of the distances measured in free glycine and 

^5 in_5l>^ine_incorpfi^ the peptide demons tr a ted-that the- ^ 
peptide was held sufficiently rigidly by the resin that any 
remaining peptide motions did not interfere with the NMR 
measurements . 

20 7- SPECIFIC EMBODIMENTS, CITATION QF REFERENCES 

The present invention is not to be limited in scope by 
the specific embodiments described herein. Indeed, various 
modifications of the invention in addition to those described 
herein will become apparent to those skilled in the art from 

25 the foregoing description and accompanying figures. Such 
modifications are intended to fall within the scope of the 
appended claims. 

Various publications are cited herein, the disclosures of 
which are incorporated by reference in their entireties. 

30 



35 
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WHAT IS CLAIMED IS: 



1. A method of determining a consensus pharmacophore 
structure comprising the steps of: 

5 (a) identifying from one or more diversity libraries a 

plurality of compounds that bind to a target 
molecule, 

(b) measuring one or more distances in one or more of 
the compounds, and 
^0 (c) determining a consensus pharmacophore structure for 

the compounds. 

2. The method of claim 1 wherein said compounds are 
peptides, peptide derivatives, or peptide analogs. 

15 _ _ _ . _ ^ . . .. . . _- 

3. The method of claim 2 wherein said compounds are peptides 
containing one or more cystines. 

4. The method of claim 3 wherein the peptides comprise the 
20 sequence CX^C (SEQ ID N0:1). 

5. The method of claim 1 further comprising a step of 
selecting a plurality of candidate pharmacophores based 
on rules of chemical homology, the selected plurality of 

25 candidate pharmacophores being used in step (c) to 

determine the consensus pharmacophore structure. 

6. The method of claim 5 wherein the rules of homology 
determine that two candidate pharmacophores are 

30 homologous if they have chemically similar side chains. 

7. The method of claim 1 which further comprises after said 
identifying step, a screening step involving a genetic 
selection technique. 



35 



8. The method of claim l wherein the step of measuring 

distance comprises making solid phase nuclear magnetic 
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resonance measurements on selected nuclei in a nuclear 
magnetic resonance spectrometer upon a sample comprising 
one of the compounds. 

5 9. The method of claim 8 wherein the step of measuring 

distances further comprises making rotational echo double 
resonance nuclear magnetic resonance measurements of 
internuclear dipole-dipole interaction strength between 
selected nuclei in the compound in the sample. 

10 

10. The method of claim 8 wherein the sample further 

comprises a substrate having a surface to which the 
compound is attached. 

J.1 • Tjhe^mejthod of claim 8 wherein-the sample is cooTed beloiT 
room temperature. 

12. The method of claim 8 wherein the compound is bound to 
the target molecule. 

20 

13. The method of claim 10 wherein a plurality of the 
compound is attached to the surface at a surface density 
such that the inter-nuclear dipole-dipole interactions 
between different molecules is less than 10% of the 

25 inter-nuclear dipole-dipole interaction within one 

molecule. 

14. The method of claim 10 wherein the substrate has pores of 
sufficient size to permit the target to diffuse and bind 

30 to the compound in the sample. 

15. The method of claim 9 wherein rotational echo double 
resonance nuclear magnetic resonance measurements can be 
made on the compound bound to the target or hydrated or 

35 in a dry nitrogen atmosphere. 



- 318 - 



wo 96/30849 



PCTAJS96/04229 



10 



16. The method of claim 10 wherein the compound is a peptide, 
and a plxirality of the peptide is attached to the 
substrate surface, which has a purity of the peptide of 
at least 95% and wherein the surface density of the 
peptide is no nore than one peptide per 100 A' of 
substrate surface. 

17. The method of claim lO wherein the substrate is selected 
from the group consisting of p-MethylBenzhydrilamine 
resin, divinylbenzyl polystyrene resin, and glass beads. 

18. The method of claim 8 wherein the selected nuclei are 
selected from the group consisting of "c, >»f, and "p. 

"»^9<i of_=claim__.^wherein-the^nuclear-magnetic- 

resonance spectrometer comprises magnetic excitation 
means, a sample rotor, and free induction decay observing 
means, and the step of making rotational echo double 
resonance nuclear magnetic resonance measurements further 
20 comprises the steps of: 

(a) spinning the sample in the sample rotor, 

(b) initially exciting magnetically the selected nuclei 
to be observed, 

(c) providing subsequently one n spin flip magnetic 

" excitation during each rotor period to each of the 

selected nuclei, the pulses to the different nuclei 
having fixed phase delays, 

(d) observing the free induction decay signal as a 
function of the number of rotor periods; and 

»0 (e) finding the dipole-dipole strength between the 

selected nuclei, whereby the internuclear distance 
between the selected nuclei can be obtained. 



35 



20. The method of claim 1 wherein the step of measuring 

distances comprises making liquid phase nuclear magnetic 
resonance measurements . 
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21- A method of determining a consensus pharmacophore 
structure comprising the steps of: 

(a) identifying from one or more diversity libraries a 
plurality of compounds that bind to a target 

5 molecule, 

(b) determining a consensus pharmacophore structure for 
the compounds. 

22. A method of determining a consensus pharmacophore 
10 structure comprising the steps of: 

(a) measuring one or more distances in one or more 
compounds that bind to a target molecule, and 

(b) determining a consensus pharmacophore structure for 
the compounds. 

IS _ _ _ _ _ _ „. .-.-^ ^= ^ .^^^^^ ^ ^ ^ 

23. The method of claim 21 or 22 further comprising a step of 
selecting a plurality of candidate pharmacophores based 
on rules of chemical homology, the selected plurality of 
candidate pharmacophores being used in step (b) to 

20 determine the consensus pharmacophore structure. 

24. The method of claim 23 wherein the compounds have limited 
conformational degrees of freedom at the temperature of 
interest, and wherein the step of determining a consensus 

25 pharmacophore structure for each compound further 

comprises, performing a consensus conf igurational bias 
Monte Carlo method, said Monte Carlo method comprising 
the steps of: 

(a) generating a proposed structure for a compound 
^® identified from said one or more diversity libraries 

by making conformational alterations consistent with 
the conformational degrees of freedom, the 
alterations being made to a representation of the 
compound's current chemical and conformational 
^* structure to generate a proposed representation, the 

pr posed structure being generated with a bias 
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10 



toward aore acceptable configurations of lower 

en rgy, wher by the method is made m re efficient, 

(b) accepting and storing the proposed structure 
according to a probability depending on an energy 
determined for the proposed structure, and 

(c) repeating these steps until sufficient structures 
have been stored for each compound to permit 
statistically significant determination of an 
equilibrium structure for each compound. 



25. A method of determining one or more lead compounds for 
use as a drug that binds to a target molecule comprising 
the steps of: 

(a) identifying from one or more diversity libraries a 
15 plurality of compounds that bind to a target 

molecule ; 

(b) determining a consensus pharmacophore structure for 
the compounds; and 

(c) determining one or more lead compounds for use as a 
20 drug which share a pharmacophore specification with 

the determined consensus pharmacophore structure. 

26. A method of determining one or more lead compounds for 
use as a drug that binds to a target molecule comprising 

25 the steps of: 

(a) measuring one or more distances in one or more 
compounds that bind to a target molecule; 

(b) determining a consensus pharmacophore structure for 
the compounds; and 

5® (c) determining one or more lead compounds for use as a 

drug which share a pharmacophore specification with 
the determined consensus pharmacophore structure. 



27. The method according to claim 25 or 26 wherein said step 
of determining one or more lead compounds comprises 
modifying a compound identified as binding to the target 
molecule, said modification being d ne outside of the 
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pharmacophore structure, to render the compound more 
attractive f r use as a drug. 

28. The method f claim 5 wherein the compounds have limited 
5 conformational degrees of freedom at a temperature of 

interest, and wherein the step of determining a consensus 
pharmacophore structure for the compounds further 
comprises performing a consensus conf igurational bias 
Monte Carlo method, said Monte Carlo method comprising 
10 the steps of: 

(a) generating a proposed structure for a compound 
identified from said one or more diversity libraries 
by making conformational alterations consistent with 
the conformational degrees of freedom, the 

^5 alterations beijcL made^^t^ of^ the - 

compound's current chemical and conformational 
structure to generate a proposed representation, the 
proposed structure being generated with a bias 
toward more acceptable configurations of lower 

20 energy, 

(b) accepting and storing the proposed structure 
according to a probability depending on an energy 
determined for the proposed structure, and 

(c) repeating these steps until sufficient structures 
25 have been stored for each compound to permit 

statistically significant determination of an 
equilibrium structure for each compound. 

29. The method of claim 28 wherein the limited conformational 
30 degrees of freedom comprise torsional rotations about 

mutual bonds between otherwise rigid subunits of the 
compound, each rigid unites representation comprising its 
interconnections and atomic composition, each atom's 
representation comprising its type and position, the 
35 torsional rotations respecting any conformational 

constraints present. 
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30. The method of claim 28 wherein the compound is a peptide, 
peptid derivative, or peptide analog. 

31. The method of claim 28 wherein the conformational 
alterations comprise constrained, concerted torsional 
rotations or removal of a side chain and regrowth of the 
side chain with a new torsional conformation, 

32. The method of claim 31 wherein the constrained, concerted 
torsional rotations are constrained so that no more than 
four rigid units are spatially displaced. 

The method of claim 28 wherein determining the energy for 
the proposed structure of one compound comprises 
including one or more^ constraii^^ which^ represent - - 

knowledge of measured structure for the compound. 

34. The method of claim 33 wherein the constraint terms 
comprise a weighted sum of squares of differences of the 

20 actual and measured structures. 

35. The method of claim 28 wherein the energy is determined 
for the proposed structure of one compound by a method 
comprising including consensus terms which represent 
knowledge that the identified compounds all bind to the 
same target, the compounds being otherwise treated 
independently by the method. 

The method of claim 35 wherein the consensus terms are a 
weighted sum of squares of differences in the atomic 
positions of a candidate pharmacophore from the average 
values of these positions in all the compounds. 

The method of claim 35 wherein the step of determining 
the consensus pharmacophore structure comprises 
determining from the plurality of selected candidate 
pharmac ph res a candidate pharmacophore for which the 
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consensus terms are relatively small compared to the 
total energy. 

38. The method of claim 35 wherein the step of determining 
5 the consensus pharmacophore structure comprises 

determining from the plurality of selected candidate 
pharmacophores a candidate pharmacophore for which the 
consensus terms are minimum compared to other selected 
regions. 

10 

39. The method of claim 28 wherein the equilibriixm structure 
is determined by a method comprising averaging selected 
generated and accepted structures for each compound, 

15 40 • The method of cj. aim 39_wherein ^the averaging-^of - - = 
structures comprises clustering selected generated and 
accepted structures into sets of similar structures and 
averaging these sets for each member. 

20 41. A method of identifying a compound that binds to a target 
molecule comprising the following steps in the order 
stated : 

(a) contacting compounds of a phage display or polysome- 
based diversity library with a target molecule; 
2^ (b) identifying one or more compounds in the library 

that bind to the target molecule; 

(c) contacting one or more first fusion proteins, each 
first fusion protein comprising an identified 
compound, with a second fusion protein comprising 

^0 the target molecule or a binding portion thereof, in 

which binding of the first fusion protein to the 
second fusion protein results in an increase in 
activity or activation of a transcriptional promoter 
or an origin of replication; and 

(d) identifying one or more of the compounds that when 
present in said first fusion pr tein result in said 
increase in activity r activation. 
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42. A method of making solid state nuclear magnetic resonance 
measurements comprising m asuring internuclear dipole- 
dipole interaction strengths between selected nuclei in a 
compound, said compound being attached to the surface of 

5 a substrate. 

43. The method of claim 42 which further comprises before 
said measuring step the step of synthesizing a plurality 
of said compound on the surface of the substrate. 

44. The method of claim 43 wherein said plurality of the 
compound is at least 95% pure. 

45. The method of claim 42 wherein a plurality of said 
compound is attached to the sffbstrate surface, with at 
least 10 A spacing between molecules of the compound. 



10 



46 



The method of claim 42 wherein the substrate has pores of 
sufficient size to permit a molecule to diffuse and bind 
20 to the compound. 



47 



25 



48 



35 

50. 



The method of claim 42 wherein the substrate has a 
surface density of the compound such that the inter- 
nuclear dipole-dipole interactions between different 
molecules of the compound is less than 10% of the inter- 
nuclear dipole-dipole Interaction within one molecule of 
the compound. 



The method of claim 42 wherein the compound is a peptide, 
30 peptide derivative, or peptide analog. 

49. The method of claim 42 wherein the substrate is selected 
from the group consisting of p-MethylBenzhydrilamine 
resin, divinylbenzyl polystyrene resin, and a glass bead. 



The meth d of claim 42 wherein said measuring step 
c mprises using a nuclear magnetic resonance 
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spectrometer, said spectrometer comprising magnetic 
excitation seans, a sample rot r, and free induction 
decay bserving means; and said measurement of 
int rnuclear dipol --dipole interaction is done by a 
5 method comprising the steps of: 

(a) spinning the sample in the sample rotor; 

(b) initially exciting magnetically the selected nuclei 
to be observed; 

(c) providing subsequently one or more n spin flip 
magnetic excitations during each rotor period to one 
or both of the selected nuclei, wherein pulses to 
the different nuclei have fixed phase delays; 

(d) observing a free induction decay signal as a 
function of the number of rotor periods; and 

15 (e) dej:ermining_ the dipole-dipole strength between the 

selected nuclei, whereby the internuclear distance 
between the selected nuclei can be obtained, 

51. A method of conf igurational bias Monte Carlo 
20 determination of the structure of a compound having 

limited conformational degrees of freedom at a 
temperature of interest, the method comprising the steps 
of: 

(a) generating a proposed structure for the compound by 
25 making conformational alterations consistent with 

the conformational degrees of freedom, the 
alterations being made to a representation of the 
compound's current chemical and conformational 
structure to generate a proposed representation; 
3* (b) accepting and storing the proposed structure 

according to a probability depending on an energy 
determined for the proposed structure; and 
(c) repeating these steps until sufficient structures 

have been stored to permit statistically significant 
^5 determination of an ecfuilibrium structure. 
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52. The »ethod of claim 51 vher in the conformational degrees 
of fre dom compriBe torsi nal rotations about mutual 
bonds between otherwise rigid subunits of the compound, 
each rigid unit's repr sentation comprising its 
5 interconnections and atomic composition, each atom's 

representation comprising its type and position, the 
torsional rotations respecting any conformational 
constraints present. 

10 53. The method of claim 51 wherein the compound is a peptide, 
peptide derivative, or peptide analog. 

54. The method of claim 51 wherein the conformational 
alterations comprise constrained, concerted torsional 

15 rotations^ 

55. The method of claim 54 wherein the constrained, concerted 
torsional rotations are constrained so that no more than 
four rigid units are spatially displaced. 

20 

56. The method of claim 51 wherein the conformational 
alterations comprise removal of a side chain and regrowth 
of the side chain with a new torsional conformation. 

25 57. The method of claim 51 wherein the proposed structures 
are generated with a bias toward more acceptable 
configurations of lower energy. 

58. The method of claim 51 wherein the energy is determined 
30 for the proposed structure by a method comprising 

including constraint terms which represent knowledge of 
measured structure for the compound. 

59. The method of claim 58 wherein the constraint terms 

35 comprise a weighted sum of squares of differences of the 

actual and measured structures. 
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60. The method f claim 51 appli d to a plurality of 

comp unds of limit d c nf rmational degrees of freedom 
all f which bind to the same target molecule wherein the 
method further comprises a step of selecting a plurality 
5 of candidate pharmacophores based on rules of chemical 

homology. 

61. The method of claim 60 wherein the energy is determined 
for the proposed structure of one of the plurality of 

10 compounds by a method comprising including consensus 

terms which represent knowledge that the compounds all 
bind to the same target molecule. 

62. The method of claim 61 wherein the consensus terms are a 

positions of a candidate pharmacophore of said one of the 
plurality of compounds from the average values of these 
positions in all the compounds. 

20 63. The method of claim 61 which further comprises a step of 
determining a consensus pharmacophore structure by 
determining from the plurality of selected candidate 
pharmacophores that candidate pharmacophore for which the 
consensus terms are minimum compared to other candidate 

25 pharmacophores. 



64. The method of claim 60 which further comprises a step of 
determining a consensus pharmacophore structure by 
determining from the plurality of selected candidate 
pharmacophores that candidate pharmacophore for which the 
consensus terms are relatively small compared to the 
total energy. 



65. The method of claim 63 or 64 which further comprises a 

step of determining one or more lead compounds for use as 
a drug which shar a pharmacophore specification with the 
determined c nsensus pharmacoph re structure. 
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66. The method of claim 51 wherein the equilibrium structure 
is determined by a method c mprising averaging selected 
generated and accepted structures. 

5 67. The method of claim 66 wherein the averaging of 

structures comprises clustering selected generated and 
accepted structures into sets of similar structures and 
averaging these sets. 

10 68. An apparatus for conf igurational bias Monte Carlo 

determination of the structure of a compound having 
limited conformational degrees of freedom at a 
temperature of interest, the apparatus comprising: 

(a) memory means for storing 

chemical and conformational structure 
consistently with the compound's degrees of 
freedom, 

(ii) similar data structures representing the 
*° compound's proposed structure and prior 

structures , and 
(iii) parameters representing atomic interactions, 
and 

(b) processor means for executing programs for 

(i) generating a proposed structure by making 

conformational alterations consistent with the 
conformational degrees of freedom and with a 
bias toward more acceptable configurations of 
lower energy, 

'0 (ii) accepting and storing the proposed structure 

according to a probability depending on an 
energy determined for the proposed structure, 
and 

(iii) repeating these steps until sufficient 
* structures have been stored to permit 

statistically significant determination of an 
equilibrium structur . 
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69. The apparatus of claim 68 wherein the conformational 
d gr s of freed m comprise torsi nal rotations about 
mutual bonds between othexrwise rigid subunits of the 
compound, each rigid unit's representation comprising its 
5 interconnections and atomic composition, each atom's 

representation comprising its type and position, the 
torsional rotations respecting any conformational 
constraints present. 

10 70. The apparatus of claim 68 wherein the compound is a 
peptide, peptide derivative, or peptide analog. 

71. The apparatus of claim 68 wherein the memory, processor, 
and control means are configured from a workstation type 

15 d igi tal computer^ compr ij ing _RAM ^memory , disk memory 

processor, and input and display devices. 

72. The apparatus of claim 68 wherein the conformational 
alterations made by the processor means further comprise 

20 constrained, concerted torsional rotations or removal of 

a side chain and regrowth of the side chain with a new 
torsional conformation. 

73. The apparatus of claim 72 wherein the constrained, 

25 concerted torsional rotations are constrained so that no 

more than four rigid units are spatially displaced. 

74. The apparatus of claim 68 wherein the processor means 
determines an energy for the proposed structure by a 

30 method comprising including constraint terms which 

represent knowledge of measured structure for the 
compound. 

75. The apparatus of claim 74 wherein the constraint terms 
35 comprise a weighted sum of squares of differences of the 

actual and measured structur s. 
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76. Th apparatus of claim 68 applied to a plurality of 

c mpounds of limited conformational degrees of freedom 
all of which bind to the same target molecule, and 
wherein the processor means further comprises programs 
for selecting a plurality of candidate pharmacophores 
based on rules of chemical homology. 

11. The apparatus of claim 76 wherein the processor means 
determines an energy for the proposed structure of any 
one compound by a method comprising including consensus 
terms which represent knowledge that the compounds all 
bind to the same target molecule. 

The apparatus of claim 77 wherein the consensus terms ar< 
a freighted _sum. of ^squares of differences in th^ atcfiic ^ 
positions of the candidate pharmacophore of said one 
compound from the average values of these positions in 
all the compounds. 



78. 

15 



20 79. The apparatus of claim 77 wherein the processor means 
further comprises programs for determining a consensus 
pharmacophore structure by determining from the plurality 
of selected candidate pharmacophores a candidate 
pharmacophore for which the consensus terms are minimum 

25 compared to other candidate pharmacophores. 

80. The apparatus of claim 77 wherein the processor means 
further comprises programs for determining a consensus 
pharmacophore structure by determining from the plurality 
30 of selected candidate pharmacophores a candidate 

pharmacophore for which the consensus terms are 
relatively small compared to the total energy. 



81. 

35 



The apparatus of claim 79 or 80 wherein the processor 
means further comprises programs for determining one or 
more lead compounds f r use as a drug that shar a 
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pharmacophore specification with the consensus 
phan&ac ph re structure, 

82. The apparatus of claim 68 wh rein the processor means 
5 determines an ecjuilibrium structure by a method 

comprising averaging selected generated and accepted 
structures • 

83. The apparatus of claim 82 wherein the averaging of 
10 structures further comprises clustering selected 

generated and accepted structures into sets of similar 
structures and averaging these sets. 

84. in a digital computer, apparatus for conf igurational bias 
15 Monte Carlo detera the structure of at least ^ 

one compound having limited conformational degrees of 
freedom at a temperature of interest, said apparatus 
comprising: 

(a) first memory means for storing data structures 
2® representing the compound's chemical and 

conformational structure consistently with the 
compound's degrees of freedom, 

(b) second memory means for storing similar data 
structures representing the compound's proposed 

25 structure, 

(c) third memory means for storing similar data 
structures representing the compound's prior 
structures, 

(d) first processor means for generating a proposed 
structure by making conformational alterations 
consistent with the conformational degrees of 
freedom and with a bias toward conformations of 
lower energy, 

(e) second processor means for accepting and storing the 
proposed structure according to a probability 
depending on an energy determined for the proposed 
structure, and 
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(f) third processor means for controlling and repeating 
the generation and acceptance until sufficient 
structures have been stored to permit statistically 
significant determination of an ecfuilibrium 
5 structure . 

85. The digital computer apparatus of claim 84 wherein the 
conformational degrees of freedom comprise torsional 
rotations about mutual bonds between otherwise rigid 

10 subunits of the compound, each rigid unit's 

representation comprising its interconnections and atomic 
composition, each atom's representation comprising its 
type and position, the torsional rotations respecting any 
conformational constraints present. 

86. The digital computer apparatus of claim 84 wherein the 
compound is a peptide, peptide derivative, or peptide 
analog. 

20 87. The digital computer apparatus of claim 84 wherein the 
digital computer is a workstation type digital computer 
comprising RAM memory, disk memory, processor, and input 
and display devices. 

25 88. The digital computer apparatus of claim 84 wherein the 
conformational alterations generated by the first 
processor means comprise constrained, concerted torsional 
rotations or removal of a side chain and regrowth of the 
side chain with a new torsional conformation. 



30 



89. The digital computer apparatus of claim 88 wherein the 
constrained, concerted torsional rotations are 
constrained so that no more than four rigid units are 
spatially displaced. 



35 

90. 



The digital computer apparatus of claim 84 wherein the 
sec nd process r means determines an energy for the 
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proposed structure by a m thod comprising including 
constraint teras which represent knowledge of measured 
structure for the compound. 

S 91. The digital computer apparatus of claim 90 wherein the 
constraint terms comprise a weighted sum of squares of 
differences of the actual and measured structures. 



92. The digital computer apparatus of claim 84 in which said 
10 at least one compound is a plurality of compounds of 

limited conformational degrees of freedom all of which 
bind to the same target and wherein data are stored in 
said first memory means representing the chemical and 
GGnformationai structure of said plurality of compounds 
15 and wherein the apparatus further.^ comprises additional 

^~ processor means for selecting a plurality of candidate 

pharmacophores based on rules of chemical homology. 

93. The digital computer apparatus of claim 92 wherein the 
20 second processor means determines an energy for the 

proposed structure of one of said plurality of compounds 
by a method comprising including consensus terms which 
represent knowledge that the compounds all bind to the 
same target molecule. 

25 

94. The digital computer apparatus of claim 92 wherein the 
consensus terms are a weighted sum of squares of 
differences in the atomic positions of a candidate 
pharmacophore of said one of the plurality of compounds 

30 from the average values of these positions in all the 

compounds . 

95. The digital computer apparatus of claim 93 wherein the 
apparatus further comprises processor means for 

35 determining a consensus pharmacophore structure by 

determining fr m the plurality of selected candidate 
pharmacoph r s a candidat pharmacoph re for which the 
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consensus terms are relatively small compared to the 
total energy. 

96. Th digital computer apparatus of claim 93 wherein the 
5 apparatus further comprises processor means for 

determining a consensus pharmacophore structure by 
determining from the plurality of selected candidate 
pharmacophores a candidate pharmacophore for which the 
consensus terms are minimum compared to other candidate 
10 pharmacophores . 

The digital computer apparatus of claims 95 or 96 wherein 
the apparatus further comprises processor means for 
determining one or more lead compounds for use as a drug 
/'i^^^L^^^ i Plisrmacophore specif-ication-with the ^ ^ ^ 
consensus pharmacophore structure. 

The digital computer apparatus of claim 84 wherein the 
third processor means determines an equilibrium structure 
by a method comprising averaging selected generated and 
accepted structures. 

The digital computer apparatus of claim 98 wherein the 
averaging of structures comprises clustering selected 
generated and accepted structures into sets of similar 
structures and averaging these sets. 

In a digital computer, apparatus for conf igurational bias 
Monte Carlo determination of the structure of a plurality 
of compounds having limited conformational degrees of 
freedom, each compound having a backbone and side chains, 
said apparatus comprising: 

(a) first memory means for storing data structures 
representing each compound's chemical and 
conformational structure consistently with that 
compound's degrees of fre dom and constraints, 
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(b) second Bern ry means for storing similar data 
structures representing a proposed structure for one 
or ttore of the compounds , 

(c) third memory means for storing similar data 

5 structures representing prior structures of the 

plurality of compounds, 

(d) first processor means for generating a proposed 
structure of a randomly selected compound by making 
conformational alterations consistent with the 

*0 conformational degrees of freedom, the 

conformational alterations being randomly 
distributed between alterations that alter the 
structure of a randomly selected side chain of the 
selected compound and alterations that alter the 
- structure of a randomly selected region dJ^he^^ 
backbone of the selected compound, the proposed 
structure being stored in the second memory means, 
the proposed structure being generated with a bias 
toward more acceptable structures of lower energy, 
whereby the method is made more efficient, 

(e) second processor means for accepting a proposed 
structure according to a probability depending on an 
energy determined for the proposed structure, the 
energy including terms representing physical 

25 interactions and terms representing heuristic 

information about the compound's structure, the 
heuristic information comprising knowledge about 
measured distances in one or more compounds of said 
plurality and about the plurality of the compounds 

^0 binding to a same target molecule, 

(f) third processor means for controlling and repeating 
these steps until sufficient structures have been 
generated and accepted to permit statistically 
significant determination of an equilibrium 

35 structure. 
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101, The digital c mputer of claim 100 wherein the 

conforznati nal degrees of freedom comprise torsional 
rotations about mutual bonds between otherwise rigid 
subunits of the compound, each rigid unit's 
5 representation comprising its interconnections and atomic 

composition, each atom's representation comprising its 
type and position, the torsional rotations respecting any 
conformational constraints present. 

10 102. The digital computer of claim 100 wherein the compound is 
a peptide, peptide derivative, or peptide analog. 



103 . A method of conf igurational bias Monte Garlo 

determination of the structure of a compound selected 
15 from the group consisting of a peptide, peptide ™ ^ 

derivative, and peptide analog, the method comprising the 
steps of: 

(a) representing the conformation of the compound by 
interconnected rigid units capable of torsional 

20 rotation about common bonds, each rigid unit's 

representation comprising its interconnections and 
atomic composition, each atom's representation 
comprising its type and position, 

(b) generating a proposed structure by making 

25 conformational alterations consistent with the 

compound's structure, 

(c) accepting a proposed structure according to a 
probability depending on an energy determined for 
the proposed structure, and 

30 (d) repeating these steps until sufficient structures 

have been generated and accepted to permit 
statistically significant determination of an 
equilibrium structure. 



35 104. An apparatus for conf igurational bias Monte Carlo 

d terminati n of th structur of a compound selected 
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from the group consisting of a peptide, peptide 
derivative, and peptide analog, the apparatus comprising: 

(a) nen ry aeans f r storing 

(i) data structures representing the compound's 
5 conformation as interconnected rigid units 

capable of torsional rotation about common 
bonds, each rigid unit's representation 
comprising its interconnections and atomic 
composition, each atom's representation 
*® comprising its type and position, 

(ii) similar data structures representing the 
compound's proposed structure and prior 
structures , and 

-.-.^ . atomic^ interactions,' 

15 and ^ ^ ^ _ ^ _ _ 

(b) processor means for executing programs for 

(i) generating a proposed structure by making 
conformational alterations consistent with the 
compound's structure and with a bias toward 

^® aore acceptable configurations of lower energy, 

(ii) accepting a proposed structure according to a 
probability depending on {in energy determined 
for the proposed structure, and 

(iii) repeating these steps until sufficient 

structures have been generated and accepted to 
permit statistically significant determination 
of an equilibrium structure. 



105. In a digital computer, apparatus for conf igurational bias 
Monte Carlo determination of the structure of a compound 
selected from the group consisting of a peptide, peptide 
derivative, and peptide analog, said apparatus 
comprising: 

(a) first memory means for storing data structures 
representing the comp und's stxructure as 
interconnected rigid tmits capable of t rsional 
rotation about common bonds, each rigid unit's 
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representation comprising its interconnections and 
atomic composition, ach atom's representation 
comprising its type and position, 

(b) second memoxry means for storing similar data 

5 structures representing the compound's proposed 

structure , 

(c) third memory means for storing similar data 
structures representing the compound's prior 
structures , 

10 (d) first processor means for generating a proposed 

structure by making conformational alterations 
consistent with the compound's structure and 



ucnstraiTits and with a 



bias toward con format ions of 



lower energy, 

15 (e) second processor means for accepting a proposed 

structure according to a probability depending on an 
energy determined for the proposed structure, and 
(f) third processor means for controlling and repeating 
these steps until sufficient structures have been 

20 generated and accepted to permit statistically 

significant determination of an equilibrium 
structure* 

106. In a digital computer, apparatus for conf igurational bias 
25 Monte Carlo determination of the structure of a plurality 

of compounds selected from the group consisting of 
peptides, peptide derivatives, and peptide analogs, each 
compound having a backbone and side chains, said 
apparatus comprising: 
30 (a) first memory means for storing data structures 

representing each compound's structure as 
interconnected rigid units capable of torsional 
rotation about common bonds, each rigid unit's 
representation comprising its interconnections and 
95 atomic composition, each atom's representation 

c mprising its type and position. 
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(b) second nemory means for st ring similar data 
structures representing a proposed structure for one 

r n re of the compounds, 

(c) third memory means for storing similar data 

5 structures representing prior structures of the 

plurality of the compounds, 

(d) first processor means for generating a proposed 
structure of a randomly selected compound by making 
conformational alterations consistent with the 

*° compound's structure, the conformational alterations 

being randomly distributed between alterations that 
alter the structure of a randomly selected side 
chain of the selected coinpGund and alterations that 
alter the structure of a randomly selected region of 

15 <^he backbone oir^theTselTctid c 

structure being stored in the second memory means, 
the proposed structure being generated with a bias 
toward more acceptable structures of lower energy, 

(e) second processor means for accepting a proposed 

'® structure according to a probability depending on an 

energy determined for the proposed structure, the 
energy including terms representing physical 
interactions and terms representing heuristic 
information about the compound's structure, the 

25 heuristic information comprising knowledge about 

measured distances in one or more compounds of said 
plurality and about the plurality of the compounds 
binding to a same target molecule, 

(f) third processor means for controlling and repeating 
these steps until sufficient structures have been 
generated and accepted to permit statistically 
significant determination of an equilibriim 
structure . 

35 107. The method of claim 42 wherein the nuclear magnetic 
resonance is r tati nal echo double r sonance. 
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108. The method of claim 1 wherein the diversity libraries 
structurally constrained organic diversity libraries. 

5 



10 



15 



20 



25 



30 
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[ I Claims Nos.: 

because ihcy relate to subject matter not n:quired to be searched by this Authority, namely: 



□ 



Claim!; Nos.: 



because ihcy relate lo parts of ihe i.Ueniaiional applicalion thai Jo nnt comply with the prescrihod 
an cxienl that no ineanrntilul inlernaiional jjcarcl* can be carried out, spceincaily: 



requirements to such 



Ciaiins Nt)s.: 
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G 1 T; K " '''^'^"'^'^ ^"""^ ^ pharmacophore generition of 
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M.IU..O,, NMR or X-ray cry.sull.,gruphy. In addition, the method of Group I as claimed does no, requiJ: the mlod of 
C oup I I as clauncd as the de,H..ndam claim of Group I which specifies the Monte Carlo method (iaim 24)Xil 
-^^rf^f structure based on=data for diversity li.rrartesr^herSiUirmethor^ 11 does no.Tu7« 

he use ... dtversuy I.I,rar.cs. Thus. Groups II and III lack the special technical feature of Croup I i.e. using a Z"ty 
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wS-i^nclld y;;^-'^'*--'; - croup H is draw„ to a method of making NMR .neasurements. 

: .1':. tz' " ' --'^-^-^^ -^'y-^-^ Thus. g^-ups » and m do 

Crou,« 1 anJ II arc related to Group IV .operate o.cthod.s and product, as the methods of Groups I and II as claimed 
do not require the apparatus of Group IV as claimed. .■■Jimul 

-rc related as .separate ...cthod a.,J product. The ntethod of Group III as elai.ned does not require the 

t u t . o. • I r 'V "-^^ - -thods other 

II au tl.. „.ctl,od o, Croup III such as use j:cnera,inj: .structures using NMR data obtained from a co,„,«.und in solution 
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